Search Results: "evan"

25 February 2024

Jacob Adams: AAC and Debian

Currently, in a default installation of Debian with the GNOME desktop, Bluetooth headphones that require the AAC codec1 cannot be used. As the Debian wiki outlines, using the AAC codec over Bluetooth, while technically supported by PipeWire, is explicitly disabled in Debian at this time. This is because the fdk-aac library needed to enable this support is currently in the non-free component of the repository, meaning that PipeWire, which is in the main component, cannot depend on it.

How to Fix it Yourself If what you, like me, need is simply for Bluetooth Audio to work with AAC in Debian s default desktop environment2, then you ll need to rebuild the pipewire package to include the AAC codec. While the current version in Debian main has been built with AAC deliberately disabled, it is trivial to enable if you can install a version of the fdk-aac library. I preface this with the usual caveats when it comes to patent and licensing controversies. I am not a lawyer, building this package and/or using it could get you into legal trouble. These instructions have only been tested on an up-to-date copy of Debian 12.
  1. Install pipewire s build dependencies
    sudo apt install build-essential devscripts
    sudo apt build-dep pipewire
    
  2. Install libfdk-aac-dev
    sudo apt install libfdk-aac-dev
    
    If the above doesn t work you ll likely need to enable non-free and try again
    sudo sed -i 's/main/main non-free/g' /etc/apt/sources.list
    sudo apt update
    
    Alternatively, if you wish to ensure you are maximally license-compliant and patent un-infringing3, you can instead build fdk-aac-free which includes only those components of AAC that are known to be patent-free3. This is what should eventually end up in Debian to resolve this problem (see below).
    sudo apt install git-buildpackage
    mkdir fdk-aac-source
    cd fdk-aac-source
    git clone https://salsa.debian.org/multimedia-team/fdk-aac
    cd fdk-aac
    gbp buildpackage
    sudo dpkg -i ../libfdk-aac2_*deb ../libfdk-aac-dev_*deb
    
  3. Get the pipewire source code
    mkdir pipewire-source
    cd pipewire-source
    apt source pipewire
    
    This will create a bunch of files within the pipewire-source directory, but you ll only need the pipewire-<version> folder, this contains all the files you ll need to build the package, with all the debian-specific patches already applied. Note that you don t want to run the apt source command as root, as it will then create files that your regular user cannot edit.
  4. Fix the dependencies and build options To fix up the build scripts to use the fdk-aac library, you need to save the following as pipewire-source/aac.patch
    --- debian/control.orig
    +++ debian/control
    @@ -40,8 +40,8 @@
                 modemmanager-dev,
                 pkg-config,
                 python3-docutils,
    -               systemd [linux-any]
    -Build-Conflicts: libfdk-aac-dev
    +               systemd [linux-any],
    +               libfdk-aac-dev
     Standards-Version: 4.6.2
     Vcs-Browser: https://salsa.debian.org/utopia-team/pipewire
     Vcs-Git: https://salsa.debian.org/utopia-team/pipewire.git
    --- debian/rules.orig
    +++ debian/rules
    @@ -37,7 +37,7 @@
     		-Dauto_features=enabled \
     		-Davahi=enabled \
     		-Dbluez5-backend-native-mm=enabled \
    -		-Dbluez5-codec-aac=disabled \
    +		-Dbluez5-codec-aac=enabled \
     		-Dbluez5-codec-aptx=enabled \
     		-Dbluez5-codec-lc3=enabled \
     		-Dbluez5-codec-lc3plus=disabled \
    
    Then you ll need to run patch from within the pipewire-<version> folder created by apt source:
    patch -p0 < ../aac.patch
    
  5. Build pipewire
    cd pipewire-*
    debuild
    
    Note that you will likely see an error from debsign at the end of this process, this is harmless, you simply don t have a GPG key set up to sign your newly-built package4. Packages don t need to be signed to be installed, and debsign uses a somewhat non-standard signing process that dpkg does not check anyway.
  1. Install libspa-0.2-bluetooth
    sudo dpkg -i libspa-0.2-bluetooth_*.deb
    
  2. Restart PipeWire and/or Reboot
    sudo reboot
    
    Theoretically there s a set of services to restart here that would get pipewire to pick up the new library, probably just pipewire itself. But it s just as easy to restart and ensure everything is using the correct library.

Why This is a slightly unusual situation, as the fdk-aac library is licensed under what even the GNU project acknowledges is a free software license. However, this license explicitly informs the user that they need to acquire a patent license to use this software5:
3. NO PATENT LICENSE NO EXPRESS OR IMPLIED LICENSES TO ANY PATENT CLAIMS, including without limitation the patents of Fraunhofer, ARE GRANTED BY THIS SOFTWARE LICENSE. Fraunhofer provides no warranty of patent non-infringement with respect to this software. You may use this FDK AAC Codec software or modifications thereto only for purposes that are authorized by appropriate patent licenses.
To quote the GNU project:
Because of this, and because the license author is a known patent aggressor, we encourage you to be careful about using or redistributing software under this license: you should first consider whether the licensor might aim to lure you into patent infringement.
AAC is covered by a number of patents, which expire at some point in the 2030s6. As such the current version of the library is potentially legally dubious to ship with any other software, as it could be considered patent-infringing3.

Fedora s solution Since 2017, Fedora has included a modified version of the library as fdk-aac-free, see the announcement and the bugzilla bug requesting review. This version of the library includes only the AAC LC profile, which is believed to be entirely patent-free3. Based on this, there is an open bug report in Debian requesting that the fdk-aac package be moved to the main component and that the pipwire package be updated to build against it.

The Debian NEW queue To resolve these bugs, a version of fdk-aac-free has been uploaded to Debian by Jeremy Bicha. However, to make it into Debian proper, it must first pass through the ftpmaster s NEW queue. The current version of fdk-aac-free has been in the NEW queue since July 2023. Based on conversations in some of the bugs above, it s been there since at least 20227. I hope this helps anyone stuck with AAC to get their hardware working for them while we wait for the package to eventually make it through the NEW queue. Discuss on Hacker News
  1. Such as, for example, any Apple AirPods, which only support AAC AFAICT.
  2. Which, as of Debian 12 is GNOME 3 under Wayland with PipeWire.
  3. I m not a lawyer, I don t know what kinds of infringement might or might not be possible here, do your own research, etc. 2 3 4
  4. And if you DO have a key setup with debsign you almost certainly don t need these instructions.
  5. This was originally phrased as explicitly does not grant any patent rights. It was pointed out on Hacker News that this is not exactly what it says, as it also includes a specific note that you ll need to acquire your own patent license. I ve now quoted the relevant section of the license for clarity.
  6. Wikipedia claims the base patents expire in 2031, with the extensions expiring in 2038, but its source for these claims is some guy s spreadsheet in a forum. The same discussion also brings up Wikipedia s claim and casts some doubt on it, so I m not entirely sure what s correct here, but I didn t feel like doing a patent deep-dive today. If someone can provide a clear answer that would be much appreciated.
  7. According to Jeremy B cha: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1021370#17

21 February 2024

Niels Thykier: Expanding on the Language Server (LSP) support for debian/control

I have spent some more time on improving my language server for debian/control. Today, I managed to provide the following features:
  • The X- style prefixes for field names are now understood and handled. This means the language server now considers XC-Package-Type the same as Package-Type.

  • More diagnostics:

    • Fields without values now trigger an error marker
    • Duplicated fields now trigger an error marker
    • Fields used in the wrong paragraph now trigger an error marker
    • Typos in field names or values now trigger a warning marker. For field names, X- style prefixes are stripped before typo detection is done.
    • The value of the Section field is now validated against a dataset of known sections and trigger a warning marker if not known.
  • The "on-save trim end of line whitespace" now works. I had a logic bug in the server side code that made it submit "no change" edits to the editor.

  • The language server now provides "hover" documentation for field names. There is a small screenshot of this below. Sadly, emacs does not support markdown or, if it does, it does not announce the support for markdown. For now, all the documentation is always in markdown format and the language server will tag it as either markdown or plaintext depending on the announced support.

  • The language server now provides quick fixes for some of the more trivial problems such as deprecated fields or typos of fields and values.

  • Added more known fields including the XS-Autobuild field for non-free packages along with a link to the relevant devref section in its hover doc.

This covers basically all my known omissions from last update except spellchecking of the Description field. An image of emacs showing documentation for the Provides field from the language server.
Spellchecking Personally, I feel spellchecking would be a very welcome addition to the current feature set. However, reviewing my options, it seems that most of the spellchecking python libraries out there are not packaged for Debian, or at least not other the name I assumed they would be. The alternative is to pipe the spellchecking to another program like aspell list. I did not test this fully, but aspell list does seem to do some input buffering that I cannot easily default (at least not in the shell). Though, either way, the logic for this will not be trivial and aspell list does not seem to include the corrections either. So best case, you would get typo markers but no suggestions for what you should have typed. Not ideal. Additionally, I am also concerned with the performance for this feature. For d/control, it will be a trivial matter in practice. However, I would be reusing this for d/changelog which is 99% free text with plenty of room for typos. For a regular linter, some slowness is acceptable as it is basically a batch tool. However, for a language server, this potentially translates into latency for your edits and that gets annoying. While it is definitely on my long term todo list, I am a bit afraid that it can easily become a time sink. Admittedly, this does annoy me, because I wanted to cross off at least one of Otto's requested features soon.
On wrap-and-sort support The other obvious request from Otto would be to automate wrap-and-sort formatting. Here, the problem is that "we" in Debian do not agree on the one true formatting of debian/control. In fact, I am fairly certain we do not even agree on whether we should all use wrap-and-sort. This implies we need a style configuration. However, if we have a style configuration per person, then you get style "ping-pong" for packages where the co-maintainers do not all have the same style configuration. Additionally, it is very likely that you are a member of multiple packaging teams or groups that all have their own unique style. Ergo, only having a personal config file is doomed to fail. The only "sane" option here that I can think of is to have or support "per package" style configuration. Something that would be committed to git, so the tooling would automatically pick up the configuration. Obviously, that is not fun for large packaging teams where you have to maintain one file per package if you want a consistent style across all packages. But it beats "style ping-pong" any day of the week. Note that I am perfectly open to having a personal configuration file as a fallback for when the "per package" configuration file is absent. The second problem is the question of which format to use and what to name this file. Since file formats and naming has never been controversial at all, this will obviously be the easy part of this problem. But the file should be parsable by both wrap-and-sort and the language server, so you get the same result regardless of which tool you use. If we do not ensure this, then we still have the style ping-pong problem as people use different tools. This also seems like time sink with no end. So, what next then...?
What next? On the language server front, I will have a look at its support for providing semantic hints to the editors that might be used for syntax highlighting. While I think most common Debian editors have built syntax highlighting already, I would like this language server to stand on its own. I would like us to be in a situation where we do not have implement yet another editor extension for Debian packaging files. At least not for editors that support the LSP spec. On a different front, I have an idea for how we go about relationship related substvars. It is not directly related to this language server, except I got triggered by the language server "missing" a diagnostic for reminding people to add the magic Depends: $ misc:Depends [, $ shlibs:Depends ] boilerplate. The magic boilerplate that you have to write even though we really should just fix this at a tooling level instead. Energy permitting, I will formulate a proposal for that and send it to debian-devel. Beyond that, I think I might start adding support for another file. I also need to wrap up my python-debian branch, so I can get the position support into the Debian soon, which would remove one papercut for using this language server. Finally, it might be interesting to see if I can extract a "batch-linter" version of the diagnostics and related quickfix features. If nothing else, the "linter" variant would enable many of you to get a "mini-Lintian" without having to do a package build first.

20 February 2024

Niels Thykier: Language Server (LSP) support for debian/control

About a month ago, Otto Kek l inen asked for editor extensions for debian related files on the debian-devel mailing list. In that thread, I concluded that what we were missing was a "Language Server" (LSP) for our packaging files. Last week, I started a prototype for such a LSP for the debian/control file as a starting point based on the pygls library. The initial prototype worked and I could do very basic diagnostics plus completion suggestion for field names.
Current features I got 4 basic features implemented, though I have only been able to test two of them in emacs.
  • Diagnostics or linting of basic issues.
  • Completion suggestions for all known field names that I could think of and values for some fields.
  • Folding ranges (untested). This feature enables the editor to "fold" multiple lines. It is often used with multi-line comments and that is the feature currently supported.
  • On save, trim trailing whitespace at the end of lines (untested). Might not be registered correctly on the server end.
Despite its very limited feature set, I feel editing debian/control in emacs is now a much more pleasant experience. Coming back to the features that Otto requested, the above covers a grand total of zero. Sorry, Otto. It is not you, it is me.
Completion suggestions For completion, all known fields are completed. Place the cursor at the start of the line or in a partially written out field name and trigger the completion in your editor. In my case, I can type R-R-R and trigger the completion and the editor will automatically replace it with Rules-Requires-Root as the only applicable match. Your milage may vary since I delegate most of the filtering to the editor, meaning the editor has the final say about whether your input matches anything. The only filtering done on the server side is that the server prunes out fields already used in the paragraph, so you are not presented with the option to repeat an already used field, which would be an error. Admittedly, not an error the language server detects at the moment, but other tools will. When completing field, if the field only has one non-default value such as Essential which can be either no (the default, but you should not use it) or yes, then the completion suggestion will complete the field along with its value. This is mostly only applicable for "yes/no" fields such as Essential and Protected. But it does also trigger for Package-Type at the moment. As for completing values, here the language server can complete the value for simple fields such as "yes/no" fields, Multi-Arch, Package-Type and Priority. I intend to add support for Section as well - maybe also Architecture.
Diagnostics On the diagnostic front, I have added multiple diagnostics:
  • An error marker for syntax errors.
  • An error marker for missing a mandatory field like Package or Architecture. This also includes Standards-Version, which is admittedly mandatory by policy rather than tooling falling part.
  • An error marker for adding Multi-Arch: same to an Architecture: all package.
  • Error marker for providing an unknown value to a field with a set of known values. As an example, writing foo in Multi-Arch would trigger this one.
  • Warning marker for using deprecated fields such as DM-Upload-Allowed, or when setting a field to its default value for fields like Essential. The latter rule only applies to selected fields and notably Multi-Arch: no does not trigger a warning.
  • Info level marker if a field like Priority duplicates the value of the Source paragraph.
Notable omission at this time:
  • No errors are raised if a field does not have a value.
  • No errors are raised if a field is duplicated inside a paragraph.
  • No errors are used if a field is used in the wrong paragraph.
  • No spellchecking of the Description field.
  • No understanding that Foo and X[CBS]-Foo are related. As an example, XC-Package-Type is completely ignored despite being the old name for Package-Type.
  • Quick fixes to solve these problems... :)
Trying it out If you want to try, it is sadly a bit more involved due to things not being uploaded or merged yet. Also, be advised that I will regularly rebase my git branches as I revise the code. The setup:
  • Build and install the deb of the main branch of pygls from https://salsa.debian.org/debian/pygls The package is in NEW and hopefully this step will soon just be a regular apt install.
  • Build and install the deb of the rts-locatable branch of my python-debian fork from https://salsa.debian.org/nthykier/python-debian There is a draft MR of it as well on the main repo.
  • Build and install the deb of the lsp-support branch of debputy from https://salsa.debian.org/debian/debputy
  • Configure your editor to run debputy lsp debian/control as the language server for debian/control. This is depends on your editor. I figured out how to do it for emacs (see below). I also found a guide for neovim at https://neovim.io/doc/user/lsp. Note that debputy can be run from any directory here. The debian/control is a reference to the file format and not a concrete file in this case.
Obviously, the setup should get easier over time. The first three bullet points should eventually get resolved by merges and upload meaning you end up with an apt install command instead of them. For the editor part, I would obviously love it if we can add snippets for editors to make the automatically pick up the language server when the relevant file is installed.
Using the debputy LSP in emacs The guide I found so far relies on eglot. The guide below assumes you have the elpa-dpkg-dev-el package installed for the debian-control-mode. Though it should be a trivially matter to replace debian-control-mode with a different mode if you use a different mode for your debian/control file. In your emacs init file (such as ~/.emacs or ~/.emacs.d/init.el), you add the follow blob.
(with-eval-after-load 'eglot
    (add-to-list 'eglot-server-programs
        '(debian-control-mode . ("debputy" "lsp" "debian/control"))))
Once you open the debian/control file in emacs, you can type M-x eglot to activate the language server. Not sure why that manual step is needed and if someone knows how to automate it such that eglot activates automatically on opening debian/control, please let me know. For testing completions, I often have to manually activate them (with C-M-i or M-x complete-symbol). Though, it is a bit unclear to me whether this is an emacs setting that I have not toggled or something I need to do on the language server side.
From here As next steps, I will probably look into fixing some of the "known missing" items under diagnostics. The quick fix would be a considerable improvement to assisting users. In the not so distant future, I will probably start to look at supporting other files such as debian/changelog or look into supporting configuration, so I can cover formatting features like wrap-and-sort. I am also very much open to how we can provide integrations for this feature into editors by default. I will probably create a separate binary package for specifically this feature that pulls all relevant dependencies that would be able to provide editor integrations as well.

16 February 2024

Melissa Wen: Keep an eye out: We are preparing the 2024 Linux Display Next Hackfest!

Igalia is preparing the 2024 Linux Display Next Hackfest and we are thrilled to announce that this year s hackfest will take place from May 14th to 16th at our HQ in A Coru a, Spain. This unconference-style event aims to bring together the most relevant players in the Linux display community to tackle current challenges and chart the future of the display stack. Key goals for the hackfest include: The hackfest fosters an intimate and focused environment to brainstorm, hack, and design solutions alongside fellow display experts. Participants will dive into discussions, tinker with code, and contribute to shaping the future of the Linux display stack. More details are available on the official website. Stay tuned! Keep an eye out for more information, mark your calendars and start prepping your hacking gear.

7 February 2024

Reproducible Builds: Reproducible Builds in January 2024

Welcome to the January 2024 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. If you are interested in contributing to the project, please visit our Contribute page on our website.

How we executed a critical supply chain attack on PyTorch John Stawinski and Adnan Khan published a lengthy blog post detailing how they executed a supply-chain attack against PyTorch, a popular machine learning platform used by titans like Google, Meta, Boeing, and Lockheed Martin :
Our exploit path resulted in the ability to upload malicious PyTorch releases to GitHub, upload releases to [Amazon Web Services], potentially add code to the main repository branch, backdoor PyTorch dependencies the list goes on. In short, it was bad. Quite bad.
The attack pivoted on PyTorch s use of self-hosted runners as well as submitting a pull request to address a trivial typo in the project s README file to gain access to repository secrets and API keys that could subsequently be used for malicious purposes.

New Arch Linux forensic filesystem tool On our mailing list this month, long-time Reproducible Builds developer kpcyrd announced a new tool designed to forensically analyse Arch Linux filesystem images. Called archlinux-userland-fs-cmp, the tool is supposed to be used from a rescue image (any Linux) with an Arch install mounted to, [for example], /mnt. Crucially, however, at no point is any file from the mounted filesystem eval d or otherwise executed. Parsers are written in a memory safe language. More information about the tool can be found on their announcement message, as well as on the tool s homepage. A GIF of the tool in action is also available.

Issues with our SOURCE_DATE_EPOCH code? Chris Lamb started a thread on our mailing list summarising some potential problems with the source code snippet the Reproducible Builds project has been using to parse the SOURCE_DATE_EPOCH environment variable:
I m not 100% sure who originally wrote this code, but it was probably sometime in the ~2015 era, and it must be in a huge number of codebases by now. Anyway, Alejandro Colomar was working on the shadow security tool and pinged me regarding some potential issues with the code. You can see this conversation here.
Chris ends his message with a request that those with intimate or low-level knowledge of time_t, C types, overflows and the various parsing libraries in the C standard library (etc.) contribute with further info.

Distribution updates In Debian this month, Roland Clobus posted another detailed update of the status of reproducible ISO images on our mailing list. In particular, Roland helpfully summarised that all major desktops build reproducibly with bullseye, bookworm, trixie and sid provided they are built for a second time within the same DAK run (i.e. [within] 6 hours) . Additionally 7 of the 8 bookworm images from the official download link build reproducibly at any later time. In addition to this, three reviews of Debian packages were added, 17 were updated and 15 were removed this month adding to our knowledge about identified issues. Elsewhere, Bernhard posted another monthly update for his work elsewhere in openSUSE.

Community updates There were made a number of improvements to our website, including Bernhard M. Wiedemann fixing a number of typos of the term nondeterministic . [ ] and Jan Zerebecki adding a substantial and highly welcome section to our page about SOURCE_DATE_EPOCH to document its interaction with distribution rebuilds. [ ].
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 254 and 255 to Debian but focusing on triaging and/or merging code from other contributors. This included adding support for comparing eXtensible ARchive (.XAR/.PKG) files courtesy of Seth Michael Larson [ ][ ], as well considerable work from Vekhir in order to fix compatibility between various and subtle incompatible versions of the progressbar libraries in Python [ ][ ][ ][ ]. Thanks!

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Reduce the number of arm64 architecture workers from 24 to 16. [ ]
    • Use diffoscope from the Debian release being tested again. [ ]
    • Improve the handling when killing unwanted processes [ ][ ][ ] and be more verbose about it, too [ ].
    • Don t mark a job as failed if process marked as to-be-killed is already gone. [ ]
    • Display the architecture of builds that have been running for more than 48 hours. [ ]
    • Reboot arm64 nodes when they hit an OOM (out of memory) state. [ ]
  • Package rescheduling changes:
    • Reduce IRC notifications to 1 when rescheduling due to package status changes. [ ]
    • Correctly set SUDO_USER when rescheduling packages. [ ]
    • Automatically reschedule packages regressing to FTBFS (build failure) or FTBR (build success, but unreproducible). [ ]
  • OpenWrt-related changes:
    • Install the python3-dev and python3-pyelftools packages as they are now needed for the sunxi target. [ ][ ]
    • Also install the libpam0g-dev which is needed by some OpenWrt hardware targets. [ ]
  • Misc:
    • As it s January, set the real_year variable to 2024 [ ] and bump various copyright years as well [ ].
    • Fix a large (!) number of spelling mistakes in various scripts. [ ][ ][ ]
    • Prevent Squid and Systemd processes from being killed by the kernel s OOM killer. [ ]
    • Install the iptables tool everywhere, else our custom rc.local script fails. [ ]
    • Cleanup the /srv/workspace/pbuilder directory on boot. [ ]
    • Automatically restart Squid if it fails. [ ]
    • Limit the execution of chroot-installation jobs to a maximum of 4 concurrent runs. [ ][ ]
Significant amounts of node maintenance was performed by Holger Levsen (eg. [ ][ ][ ][ ][ ][ ][ ] etc.) and Vagrant Cascadian (eg. [ ][ ][ ][ ][ ][ ][ ][ ]). Indeed, Vagrant Cascadian handled an extended power outage for the network running the Debian armhf architecture test infrastructure. This provided the incentive to replace the UPS batteries and consolidate infrastructure to reduce future UPS load. [ ] Elsewhere in our infrastructure, however, Holger Levsen also adjusted the email configuration for @reproducible-builds.org to deal with a new SMTP email attack. [ ]

Upstream patches The Reproducible Builds project tries to detects, dissects and fix as many (currently) unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Separate to this, Vagrant Cascadian followed up with the relevant maintainers when reproducibility fixes were not included in newly-uploaded versions of the mm-common package in Debian this was quickly fixed, however. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

28 January 2024

Niels Thykier: Annotating the Debian packaging directory

In my previous blog post Providing online reference documentation for debputy, I made a point about how debhelper documentation was suboptimal on account of being static rather than online. The thing is that debhelper is not alone in this problem space, even if it is a major contributor to the number of packaging files you have to to know about. If we look at the "competition" here such as Fedora and Arch Linux, they tend to only have one packaging file. While most Debian people will tell you a long list of cons about having one packaging file (such a Fedora's spec file being 3+ domain specific languages "mashed" into one file), one major advantage is that there is only "the one packaging file". You only need to remember where to find the documentation for one file, which is great when you are running on wetware with limited storage capacity. Which means as a newbie, you can dedicate less mental resources to tracking multiple files and how they interact and more effort understanding the "one file" at hand. I started by asking myself how can we in Debian make the packaging stack more accessible to newcomers? Spoiler alert, I dug myself into rabbit hole and ended up somewhere else than where I thought I was going. I started by wanting to scan the debian directory and annotate all files that I could with documentation links. The logic was that if debputy could do that for you, then you could spend more mental effort elsewhere. So I combined debputy's packager provided files detection with a static list of files and I quickly had a good starting point for debputy-based packages.
Adding (non-static) dpkg and debhelper files to the mix Now, I could have closed the topic here and said "Look, I did debputy files plus couple of super common files". But I decided to take it a bit further. I added support for handling some dpkg files like packager provided files (such as debian/substvars and debian/symbols). But even then, we all know that debhelper is the big hurdle and a major part of the omission... In another previous blog post (A new Debian package helper: debputy), I made a point about how debputy could list all auxiliary files while debhelper could not. This was exactly the kind of feature that I would need for this feature, if this feature was to cover debhelper. Now, I also remarked in that blog post that I was not willing to maintain such a list. Also, I may have ranted about static documentation being unhelpful for debhelper as it excludes third-party provided tooling. Fortunately, a recent update to dh_assistant had provided some basic plumbing for loading dh sequences. This meant that getting a list of all relevant commands for a source package was a lot easier than it used to be. Once you have a list of commands, it would be possible to check all of them for dh's NOOP PROMISE hints. In these hints, a command can assert it does nothing if a given pkgfile is not present. This lead to the new dh_assistant list-guessed-dh-config-files command that will list all declared pkgfiles and which helpers listed them. With this combined feature set in place, debputy could call dh_assistant to get a list of pkgfiles, pretend they were packager provided files and annotate those along with manpage for the relevant debhelper command. The exciting thing about letting debpputy resolve the pkgfiles is that debputy will resolve "named" files automatically (debhelper tools will only do so when --name is passed), so it is much more likely to detect named pkgfiles correctly too. Side note: I am going to ignore the elephant in the room for now, which is dh_installsystemd and its package@.service files and the wide-spread use of debian/foo.service where there is no package called foo. For the latter case, the "proper" name would be debian/pkg.foo.service. With the new dh_assistant feature done and added to debputy, debputy could now detect the ubiquitous debian/install file. Excellent. But less great was that the very common debian/docs file was not. Turns out that dh_installdocs cannot be skipped by dh, so it cannot have NOOP PROMISE hints. Meh... Well, dh_assistant could learn about a new INTROSPECTABLE marker in addition to the NOOP PROMISE and then I could sprinkle that into a few commands. Indeed that worked and meant that debian/postinst (etc.) are now also detectable. At this point, debputy would be able to identify a wide range of debhelper related configuration files in debian/ and at least associate each of them with one or more commands. Nice, surely, this would be a good place to stop, right...?
Adding more metadata to the files The debhelper detected files only had a command name and manpage URI to that command. It would be nice if we could contextualize this a bit more. Like is this file installed into the package as is like debian/pam or is it a file list to be processed like debian/install. To make this distinction, I could add the most common debhelper file types to my static list and then merge the result together. Except, I do not want to maintain a full list in debputy. Fortunately, debputy has a quite extensible plugin infrastructure, so added a new plugin feature to provide this kind of detail and now I can outsource the problem! I split my definitions into two and placed the generic ones in the debputy-documentation plugin and moved the debhelper related ones to debhelper-documentation. Additionally, third-party dh addons could provide their own debputy plugin to add context to their configuration files. So, this gave birth file categories and configuration features, which described each file on different fronts. As an example, debian/gbp.conf could be tagged as a maint-config to signal that it is not directly related to the package build but more of a tool or style preference file. On the other hand, debian/install and debian/debputy.manifest would both be tagged as a pkg-helper-config. Files like debian/pam were tagged as ppf-file for packager provided file and so on. I mentioned configuration features above and those were added because, I have had a beef with debhelper's "standard" configuration file format as read by filearray and filedoublearray. They are often considered simple to understand, but it is hard to know how a tool will actually read the file. As an example, consider the following:
  • Will the debhelper use filearray, filedoublearray or none of them to read the file? This topic has about 2 bits of entropy.
  • Will the config file be executed if it is marked executable assuming you are using the right compat level? If it is executable, does dh-exec allow renaming for this file? This topic adds 1 or 2 bit of entropy depending on the context.
  • Will the config file be subject to glob expansions? This topic sounds like a boolean but is a complicated mess. The globs can be handled either by debhelper as it parses the file for you. In this case, the globs are applied to every token. However, this is not what dh_install does. Here the last token on each line is supposed to be a directory and therefore not subject to globs. Therefore, dh_install does the globbing itself afterwards but only on part of the tokens. So that is about 2 bits of entropy more. Actually, it gets worse...
    • If the file is executed, debhelper will refuse to expand globs in the output of the command, which was a deliberate design choice by the original debhelper maintainer took when he introduced the feature in debhelper/8.9.12. Except, dh_install feature interacts with the design choice and does enable glob expansion in the tool output, because it does so manually after its filedoublearray call.
So these "simple" files have way too many combinations of how they can be interpreted. I figured it would be helpful if debputy could highlight these difference, so I added support for those as well. Accordingly, debian/install is tagged with multiple tags including dh-executable-config and dh-glob-after-execute. Then, I added a datatable of these tags, so it would be easy for people to look up what they meant. Ok, this seems like a closed deal, right...?
Context, context, context However, the dh-executable-config tag among other are only applicable in compat 9 or later. It does not seem newbie friendly if you are told that this feature exist, but then have to read in the extended description that that it actually does not apply to your package. This problem seems fixable. Thanks to dh_assistant, it is easy to figure out which compat level the package is using. Then tweak some metadata to enable per compat level rules. With that tags like dh-executable-config only appears for packages using compat 9 or later. Also, debputy should be able to tell you where packager provided files like debian/pam are installed. We already have the logic for packager provided files that debputy supports and I am already using debputy engine for detecting the files. If only the plugin provided metadata gave me the install pattern, debputy would be able tell you where this file goes in the package. Indeed, a bit of tweaking later and setting install-pattern to usr/lib/pam.d/ name , debputy presented me with the correct install-path with the package name placing the name placeholder. Now, I have been using debian/pam as an example, because debian/pam is installed into usr/lib/pam.d in compat 14. But in earlier compat levels, it was installed into etc/pam.d. Well, I already had an infrastructure for doing compat file tags. Off we go to add install-pattern to the complat level infrastructure and now changing the compat level would change the path. Great. (Bug warning: The value is off-by-one in the current version of debhelper. This is fixed in git) Also, while we are in this install-pattern business, a number of debhelper config files causes files to be installed into a fixed directory. Like debian/docs which causes file to be installed into /usr/share/docs/ package . Surely, we can expand that as well and provide that bit of context too... and done. (Bug warning: The code currently does not account for the main documentation package context) It is rather common pattern for people to do debian/foo.in files, because they want to custom generation of debian/foo. Which means if you have debian/foo you get "Oh, let me tell you about debian/foo ". Then you rename it to debian/foo.in and the result is "debian/foo.in is a total mystery to me!". That is suboptimal, so lets detect those as well as if they were the original file but add a tag saying that they are a generate template and which file we suspect it generates. Finally, if you use debputy, almost all of the standard debhelper commands are removed from the sequence, since debputy replaces them. It would be weird if these commands still contributed configuration files when they are not actually going to be invoked. This mostly happened naturally due to the way the underlying dh_assistant command works. However, any file mentioned by the debhelper-documentation plugin would still appear unfortunately. So off I went to filter the list of known configuration files against which dh_ commands that dh_assistant thought would be used for this package.
Wrapping it up I was several layers into this and had to dig myself out. I have ended up with a lot of data and metadata. But it was quite difficult for me to arrange the output in a user friendly manner. However, all this data did seem like it would be useful any tool that wants to understand more about the package. So to get out of the rabbit hole, I for now wrapped all of this into JSON and now we have a debputy tool-support annotate-debian-directory command that might be useful for other tools. To try it out, you can try the following demo: In another day, I will figure out how to structure this output so it is useful for non-machine consumers. Suggestions are welcome. :)
Limitations of the approach As a closing remark, I should probably remind people that this feature relies heavily on declarative features. These include:
  • When determining which commands are relevant, using Build-Depends: dh-sequence-foo is much more reliable than configuring it via the Turing complete configuration we call debian/rules.
  • When debhelper commands use NOOP promise hints, dh_assistant can "see" the config files listed those hints, meaning the file will at least be detected. For new introspectable hint and the debputy plugin, it is probably better to wait until the dust settles a bit before adding any of those.
You can help yourself and others to better results by using the declarative way rather than using debian/rules, which is the bane of all introspection!

26 January 2024

Bastian Venthur: Investigating popularity of Python build backends over time

Inspired by a Mastodon post by Fran oise Conil, who investigated the current popularity of build backends used in pyproject.toml files, I wanted to investigate how the popularity of build backends used in pyproject.toml files evolved over the years since the introduction of PEP-0517 in 2015. Getting the data Tom Forbes provides a huge dataset that contains information about every file within every release uploaded to PyPI. To get the current dataset, we can use:
curl -L --remote-name-all $(curl -L "https://github.com/pypi-data/data/raw/main/links/dataset.txt")
This will download approximately 30GB of parquet files, providing detailed information about each file included in a PyPI upload, including:
  1. project name, version and release date
  2. file path, size and line count
  3. hash of the file
The dataset does not contain the actual files themselves though, more on that in a moment. Querying the dataset using duckdb We can now use duckdb to query the parquet files directly. Let s look into the schema first:
describe select * from '*.parquet';
 
    column_name     column_type    null    
      varchar         varchar     varchar  
 
  project_name      VARCHAR       YES      
  project_version   VARCHAR       YES      
  project_release   VARCHAR       YES      
  uploaded_on       TIMESTAMP     YES      
  path              VARCHAR       YES      
  archive_path      VARCHAR       YES      
  size              UBIGINT       YES      
  hash              BLOB          YES      
  skip_reason       VARCHAR       YES      
  lines             UBIGINT       YES      
  repository        UINTEGER      YES      
 
  11 rows                       6 columns  
 
From all files mentioned in the dataset, we only care about pyproject.toml files that are in the project s root directory. Since we ll still have to download the actual files, we need to get the path and the repository to construct the corresponding URL to the mirror that contains all files in a bunch of huge git repositories. Some files are not available on the mirrors; to skip these, we only take files where the skip_reason is empty. We also care about the timestamp of the upload (uploaded_on) and the hash to avoid processing identical files twice:
select
    path,
    hash,
    uploaded_on,
    repository
from '*.parquet'
where
    skip_reason == '' and
    lower(string_split(path, '/')[-1]) == 'pyproject.toml' and
    len(string_split(path, '/')) == 5
order by uploaded_on desc
This query runs for a few minutes on my laptop and returns ~1.2M rows. Getting the actual files Using the repository and path, we can now construct an URL from which we can fetch the actual file for further processing:
url = f"https://raw.githubusercontent.com/pypi-data/pypi-mirror- repository /code/ path "
We can download the individual pyproject.toml files and parse them to read the build-backend into a dictionary mapping the file-hash to the build backend. Downloads on GitHub are rate-limited, so downloading 1.2M files will take a couple of days. By skipping files with a hash we ve already processed, we can avoid downloading the same file more than once, cutting the required downloads by circa 50%. Results Assuming the data is complete and my analysis is sound, these are the findings: There is a surprising amount of build backends in use, but the overall amount of uploads per build backend decreases quickly, with a long tail of single uploads:
>>> results.backend.value_counts()
backend
setuptools        701550
poetry            380830
hatchling          56917
flit               36223
pdm                11437
maturin             9796
jupyter             1707
mesonpy              625
scikit               556
                   ...
postry                 1
tree                   1
setuptoos              1
neuron                 1
avalon                 1
maturimaturinn         1
jsonpath               1
ha                     1
pyo3                   1
Name: count, Length: 73, dtype: int64
We pick only the top 4 build backends, and group the remaining ones (including PDM and Maturin) into other so they are accounted for as well. The following plot shows the relative distribution of build backends over time. Each bin represents a time span of 28 days. I chose 28 days to reduce visual clutter. Within each bin, the height of the bars corresponds to the relative proportion of uploads during that time interval: Relative distribution of build backends over time Looking at the right side of the plot, we see the current distribution. It confirms Fran oise s findings about the current popularity of build backends: Between 2018 and 2020 the graph exhibits significant fluctuations, due to the relatively low amount uploads utizing pyproject.toml files. During that early period, Flit started as the most popular build backend, but was eventually displaced by Setuptools and Poetry. Between 2020 and 2020, the overall usage of pyproject.toml files increased significantly. By the end of 2022, the share of Setuptools peaked at 70%. After 2020, other build backends experienced a gradual rise in popularity. Amongh these, Hatch emerged as a notable contender, steadily gaining traction and ultimately stabilizing at 10%. We can also look into the absolute distribution of build backends over time: Absolute distribution of build backends over time The plot shows that Setuptools has the strongest growth trajectory, surpassing all other build backends. Poetry and Hatch are growing at a comparable rate, but since Hatch started roughly 4 years after Poetry, it s lagging behind in popularity. Despite not being among the most widely used backends anymore, Flit maintains a steady and consistent growth pattern, indicating its enduring relevance in the Python packaging landscape. The script for downloading and analyzing the data can be found in my GitHub repository. It contains the results of the duckb query (so you don t have to download the full dataset) and the pickled dictionary, mapping the file hashes to the build backends, saving you days for downloading and analyzing the pyproject.toml files yourself.

20 January 2024

Niels Thykier: Making debputy: Writing declarative parsing logic

In this blog post, I will cover how debputy parses its manifest and the conceptual improvements I did to make parsing of the manifest easier. All instructions to debputy are provided via the debian/debputy.manifest file and said manifest is written in the YAML format. After the YAML parser has read the basic file structure, debputy does another pass over the data to extract the information from the basic structure. As an example, the following YAML file:
manifest-version: "0.1"
installations:
  - install:
      source: foo
      dest-dir: usr/bin
would be transformed by the YAML parser into a structure resembling:
 
  "manifest-version": "0.1",
  "installations": [
      
       "install":  
         "source": "foo",
         "dest-dir": "usr/bin",
        
      
  ]
 
This structure is then what debputy does a pass on to translate this into an even higher level format where the "install" part is translated into an InstallRule. In the original prototype of debputy, I would hand-write functions to extract the data that should be transformed into the internal in-memory high level format. However, it was quite tedious. Especially because I wanted to catch every possible error condition and report "You are missing the required field X at Y" rather than the opaque KeyError: X message that would have been the default. Beyond being tedious, it was also quite error prone. As an example, in debputy/0.1.4 I added support for the install rule and you should allegedly have been able to add a dest-dir: or an as: inside it. Except I crewed up the code and debputy was attempting to look up these keywords from a dict that could never have them. Hand-writing these parsers were so annoying that it demotivated me from making manifest related changes to debputy simply because I did not want to code the parsing logic. When I got this realization, I figured I had to solve this problem better. While reflecting on this, I also considered that I eventually wanted plugins to be able to add vocabulary to the manifest. If the API was "provide a callback to extract the details of whatever the user provided here", then the result would be bad.
  1. Most plugins would probably throw KeyError: X or ValueError style errors for quite a while. Worst case, they would end on my table because the user would have a hard time telling where debputy ends and where the plugins starts. "Best" case, I would teach debputy to say "This poor error message was brought to you by plugin foo. Go complain to them". Either way, it would be a bad user experience.
  2. This even assumes plugin providers would actually bother writing manifest parsing code. If it is that difficult, then just providing a custom file in debian might tempt plugin providers and that would undermine the idea of having the manifest be the sole input for debputy.
So beyond me being unsatisfied with the current situation, it was also clear to me that I needed to come up with a better solution if I wanted externally provided plugins for debputy. To put a bit more perspective on what I expected from the end result:
  1. It had to cover as many parsing errors as possible. An error case this code would handle for you, would be an error where I could ensure it sufficient degree of detail and context for the user.
  2. It should be type-safe / provide typing support such that IDEs/mypy could help you when you work on the parsed result.
  3. It had to support "normalization" of the input, such as
           # User provides
           - install: "foo"
           # Which is normalized into:
           - install:
               source: "foo"
4) It must be simple to tell  debputy  what input you expected.
At this point, I remembered that I had seen a Python (PYPI) package where you could give it a TypedDict and an arbitrary input (Sadly, I do not remember the name). The package would then validate the said input against the TypedDict. If the match was successful, you would get the result back casted as the TypedDict. If the match was unsuccessful, the code would raise an error for you. Conceptually, this seemed to be a good starting point for where I wanted to be. Then I looked a bit on the normalization requirement (point 3). What is really going on here is that you have two "schemas" for the input. One is what the programmer will see (the normalized form) and the other is what the user can input (the manifest form). The problem is providing an automatic normalization from the user input to the simplified programmer structure. To expand a bit on the following example:
# User provides
- install: "foo"
# Which is normalized into:
- install:
    source: "foo"
Given that install has the attributes source, sources, dest-dir, as, into, and when, how exactly would you automatically normalize "foo" (str) into source: "foo"? Even if the code filtered by "type" for these attributes, you would end up with at least source, dest-dir, and as as candidates. Turns out that TypedDict actually got this covered. But the Python package was not going in this direction, so I parked it here and started looking into doing my own. At this point, I had a general idea of what I wanted. When defining an extension to the manifest, the plugin would provide debputy with one or two definitions of TypedDict. The first one would be the "parsed" or "target" format, which would be the normalized form that plugin provider wanted to work on. For this example, lets look at an earlier version of the install-examples rule:
# Example input matching this typed dict.
#    
#       "source": ["foo"]
#       "into": ["pkg"]
#    
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]
In this form, the install-examples has two attributes - both are list of strings. On the flip side, what the user can input would look something like this:
# Example input matching this typed dict.
#    
#       "source": "foo"
#       "into": "pkg"
#    
#
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
FullInstallExamplesManifestFormat = Union[
    InstallExamplesManifestFormat,
    List[str],
    str,
]
The idea was that the plugin provider would use these two definitions to tell debputy how to parse install-examples. Pseudo-registration code could look something like:
def _handler(
    normalized_form: InstallExamplesTargetFormat,
) -> InstallRule:
    ...  # Do something with the normalized form and return an InstallRule.
concept_debputy_api.add_install_rule(
  keyword="install-examples",
  target_form=InstallExamplesTargetFormat,
  manifest_form=FullInstallExamplesManifestFormat,
  handler=_handler,
)
This was my conceptual target and while the current actual API ended up being slightly different, the core concept remains the same.
From concept to basic implementation Building this code is kind like swallowing an elephant. There was no way I would just sit down and write it from one end to the other. So the first prototype of this did not have all the features it has now. Spoiler warning, these next couple of sections will contain some Python typing details. When reading this, it might be helpful to know things such as Union[str, List[str]] being the Python type for either a str (string) or a List[str] (list of strings). If typing makes your head spin, these sections might less interesting for you. To build this required a lot of playing around with Python's introspection and typing APIs. My very first draft only had one "schema" (the normalized form) and had the following features:
  • Read TypedDict.__required_attributes__ and TypedDict.__optional_attributes__ to determine which attributes where present and which were required. This was used for reporting errors when the input did not match.
  • Read the types of the provided TypedDict, strip the Required / NotRequired markers and use basic isinstance checks based on the resulting type for str and List[str]. Again, used for reporting errors when the input did not match.
This prototype did not take a long (I remember it being within a day) and worked surprisingly well though with some poor error messages here and there. Now came the first challenge, adding the manifest format schema plus relevant normalization rules. The very first normalization I did was transforming into: Union[str, List[str]] into into: List[str]. At that time, source was not a separate attribute. Instead, sources was a Union[str, List[str]], so it was the only normalization I needed for all my use-cases at the time. There are two problems when writing a normalization. First is determining what the "source" type is, what the target type is and how they relate. The second is providing a runtime rule for normalizing from the manifest format into the target format. Keeping it simple, the runtime normalizer for Union[str, List[str]] -> List[str] was written as:
def normalize_into_list(x: Union[str, List[str]]) -> List[str]:
    return x if isinstance(x, list) else [x]
This basic form basically works for all types (assuming none of the types will have List[List[...]]). The logic for determining when this rule is applicable is slightly more involved. My current code is about 100 lines of Python code that would probably lose most of the casual readers. For the interested, you are looking for _union_narrowing in declarative_parser.py With this, when the manifest format had Union[str, List[str]] and the target format had List[str] the generated parser would silently map a string into a list of strings for the plugin provider. But with that in place, I had covered the basics of what I needed to get started. I was quite excited about this milestone of having my first keyword parsed without handwriting the parser logic (at the expense of writing a more generic parse-generator framework).
Adding the first parse hint With the basic implementation done, I looked at what to do next. As mentioned, at the time sources in the manifest format was Union[str, List[str]] and I considered to split into a source: str and a sources: List[str] on the manifest side while keeping the normalized form as sources: List[str]. I ended up committing to this change and that meant I had to solve the problem getting my parser generator to understand the situation:
# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]
There are two related problems to solve here:
  1. How will the parser generator understand that source should be normalized and then mapped into sources?
  2. Once that is solved, the parser generator has to understand that while source and sources are declared as NotRequired, they are part of a exactly one of rule (since sources in the target form is Required). This mainly came down to extra book keeping and an extra layer of validation once the previous step is solved.
While working on all of this type introspection for Python, I had noted the Annotated[X, ...] type. It is basically a fake type that enables you to attach metadata into the type system. A very random example:
# For all intents and purposes,  foo  is a string despite all the  Annotated  stuff.
foo: Annotated[str, "hello world"] = "my string here"
The exciting thing is that you can put arbitrary details into the type field and read it out again in your introspection code. Which meant, I could add "parse hints" into the type. Some "quick" prototyping later (a day or so), I got the following to work:
# Map from
#      
#        "source": "foo"  # (or "sources": ["foo"])
#        "into": "pkg"
#      
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
# ... into
#      
#        "source": ["foo"]
#        "into": ["pkg"]
#      
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]
Without me (as a plugin provider) writing a line of code, I can have debputy rename or "merge" attributes from the manifest form into the normalized form. Obviously, this required me (as the debputy maintainer) to write a lot code so other me and future plugin providers did not have to write it.
High level typing At this point, basic normalization between one mapping to another mapping form worked. But one thing irked me with these install rules. The into was a list of strings when the parser handed them over to me. However, I needed to map them to the actual BinaryPackage (for technical reasons). While I felt I was careful with my manual mapping, I knew this was exactly the kind of case where a busy programmer would skip the "is this a known package name" check and some user would typo their package resulting in an opaque KeyError: foo. Side note: "Some user" was me today and I was super glad to see debputy tell me that I had typoed a package name (I would have been more happy if I had remembered to use debputy check-manifest, so I did not have to wait through the upstream part of the build that happened before debhelper passed control to debputy...) I thought adding this feature would be simple enough. It basically needs two things:
  1. Conversion table where the parser generator can tell that BinaryPackage requires an input of str and a callback to map from str to BinaryPackage. (That is probably lie. I think the conversion table came later, but honestly I do remember and I am not digging into the git history for this one)
  2. At runtime, said callback needed access to the list of known packages, so it could resolve the provided string.
It was not super difficult given the existing infrastructure, but it did take some hours of coding and debugging. Additionally, I added a parse hint to support making the into conditional based on whether it was a single binary package. With this done, you could now write something like:
# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[BinaryPackage, List[BinaryPackage]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[
        Annotated[
            List[BinaryPackage],
            DebputyParseHint.required_when_multi_binary()
        ]
    ]
Code-wise, I still had to check for into being absent and providing a default for that case (that is still true in the current codebase - I will hopefully fix that eventually). But I now had less room for mistakes and a standardized error message when you misspell the package name, which was a plus.
The added side-effect - Introspection A lovely side-effect of all the parsing logic being provided to debputy in a declarative form was that the generated parser snippets had fields containing all expected attributes with their types, which attributes were required, etc. This meant that adding an introspection feature where you can ask debputy "What does an install rule look like?" was quite easy. The code base already knew all of this, so the "hard" part was resolving the input the to concrete rule and then rendering it to the user. I added this feature recently along with the ability to provide online documentation for parser rules. I covered that in more details in my blog post Providing online reference documentation for debputy in case you are interested. :)
Wrapping it up This was a short insight into how debputy parses your input. With this declarative technique:
  • The parser engine handles most of the error reporting meaning users get most of the errors in a standard format without the plugin provider having to spend any effort on it. There will be some effort in more complex cases. But the common cases are done for you.
  • It is easy to provide flexibility to users while avoiding having to write code to normalize the user input into a simplified programmer oriented format.
  • The parser handles mapping from basic types into higher forms for you. These days, we have high level types like FileSystemMode (either an octal or a symbolic mode), different kind of file system matches depending on whether globs should be performed, etc. These types includes their own validation and parsing rules that debputy handles for you.
  • Introspection and support for providing online reference documentation. Also, debputy checks that the provided attribute documentation covers all the attributes in the manifest form. If you add a new attribute, debputy will remind you if you forget to document it as well. :)
In this way everybody wins. Yes, writing this parser generator code was more enjoyable than writing the ad-hoc manual parsers it replaced. :)

2 January 2024

Matthew Garrett: Dealing with weird ELF libraries

Libraries are collections of code that are intended to be usable by multiple consumers (if you're interested in the etymology, watch this video). In the old days we had what we now refer to as "static" libraries, collections of code that existed on disk but which would be copied into newly compiled binaries. We've moved beyond that, thankfully, and now make use of what we call "dynamic" or "shared" libraries - instead of the code being copied into the binary, a reference to the library function is incorporated, and at runtime the code is mapped from the on-disk copy of the shared object[1]. This allows libraries to be upgraded without needing to modify the binaries using them, and if multiple applications are using the same library at once it only requires that one copy of the code be kept in RAM.

But for this to work, two things are necessary: when we build a binary, there has to be a way to reference the relevant library functions in the binary; and when we run a binary, the library code needs to be mapped into the process.

(I'm going to somewhat simplify the explanations from here on - things like symbol versioning make this a bit more complicated but aren't strictly relevant to what I was working on here)

For the first of these, the goal is to replace a call to a function (eg, printf()) with a reference to the actual implementation. This is the job of the linker rather than the compiler (eg, if you use the -c argument to tell gcc to simply compile to an object rather than linking an executable, it's not going to care about whether or not every function called in your code actually exists or not - that'll be figured out when you link all the objects together), and the linker needs to know which symbols (which aren't just functions - libraries can export variables or structures and so on) are available in which libraries. You give the linker a list of libraries, it extracts the symbols available, and resolves the references in your code with references to the library.

But how is that information extracted? Each ELF object has a fixed-size header that contains references to various things, including a reference to a list of "section headers". Each section has a name and a type, but the ones we're interested in are .dynstr and .dynsym. .dynstr contains a list of strings, representing the name of each exported symbol. .dynsym is where things get more interesting - it's a list of structs that contain information about each symbol. This includes a bunch of fairly complicated stuff that you need to care about if you're actually writing a linker, but the relevant entries for this discussion are an index into .dynstr (which means the .dynsym entry isn't sufficient to know the name of a symbol, you need to extract that from .dynstr), along with the location of that symbol within the library. The linker can parse this information and obtain a list of symbol names and addresses, and can now replace the call to printf() with a reference to libc instead.

(Note that it's not possible to simply encode this as "Call this address in this library" - if the library is rebuilt or is a different version, the function could move to a different location)

Experimentally, .dynstr and .dynsym appear to be sufficient for linking a dynamic library at build time - there are other sections related to dynamic linking, but you can link against a library that's missing them. Runtime is where things get more complicated.

When you run a binary that makes use of dynamic libraries, the code from those libraries needs to be mapped into the resulting process. This is the job of the runtime dynamic linker, or RTLD[2]. The RTLD needs to open every library the process requires, map the relevant code into the process's address space, and then rewrite the references in the binary into calls to the library code. This requires more information than is present in .dynstr and .dynsym - at the very least, it needs to know the list of required libraries.

There's a separate section called .dynamic that contains another list of structures, and it's the data here that's used for this purpose. For example, .dynamic contains a bunch of entries of type DT_NEEDED - this is the list of libraries that an executable requires. There's also a bunch of other stuff that's required to actually make all of this work, but the only thing I'm going to touch on is DT_HASH. Doing all this re-linking at runtime involves resolving the locations of a large number of symbols, and if the only way you can do that is by reading a list from .dynsym and then looking up every name in .dynstr that's going to take some time. The DT_HASH entry points to a hash table - the RTLD hashes the symbol name it's trying to resolve, looks it up in that hash table, and gets the symbol entry directly (it still needs to resolve that against .dynstr to make sure it hasn't hit a hash collision - if it has it needs to look up the next hash entry, but this is still generally faster than walking the entire .dynsym list to find the relevant symbol). There's also DT_GNU_HASH which fulfills the same purpose as DT_HASH but uses a more complicated algorithm that performs even better. .dynamic also contains entries pointing at .dynstr and .dynsym, which seems redundant but will become relevant shortly.

So, .dynsym and .dynstr are required at build time, and both are required along with .dynamic at runtime. This seems simple enough, but obviously there's a twist and I'm sorry it's taken so long to get to this point.

I bought a Synology NAS for home backup purposes (my previous solution was a single external USB drive plugged into a small server, which had uncomfortable single point of failure properties). Obviously I decided to poke around at it, and I found something odd - all the libraries Synology ships were entirely lacking any ELF section headers. This meant no .dynstr, .dynsym or .dynamic sections, so how was any of this working? nm asserted that the libraries exported no symbols, and readelf agreed. If I wrote a small app that called a function in one of the libraries and built it, gcc complained that the function was undefined. But executables on the device were clearly resolving the symbols at runtime, and if I loaded them into ghidra the exported functions were visible. If I dlopen()ed them, dlsym() couldn't resolve the symbols - but if I hardcoded the offset into my code, I could call them directly.

Things finally made sense when I discovered that if I passed the --use-dynamic argument to readelf, I did get a list of exported symbols. It turns out that ELF is weirder than I realised. As well as the aforementioned section headers, ELF objects also include a set of program headers. One of the program header types is PT_DYNAMIC. This typically points to the same data that's present in the .dynamic section. Remember when I mentioned that .dynamic contained references to .dynsym and .dynstr? This means that simply pointing at .dynamic is sufficient, there's no need to have separate entries for them.

The same information can be reached from two different locations. The information in the section headers is used at build time, and the information in the program headers at run time[3]. I do not have an explanation for this. But if the information is present in two places, it seems obvious that it should be able to reconstruct the missing section headers in my weird libraries? So that's what this does. It extracts information from the DYNAMIC entry in the program headers and creates equivalent section headers.

There's one thing that makes this more difficult than it might seem. The section header for .dynsym has to contain the number of symbols present in the section. And that information doesn't directly exist in DYNAMIC - to figure out how many symbols exist, you're expected to walk the hash tables and keep track of the largest number you've seen. Since every symbol has to be referenced in the hash table, once you've hit every entry the largest number is the number of exported symbols. This seemed annoying to implement, so instead I cheated, added code to simply pass in the number of symbols on the command line, and then just parsed the output of readelf against the original binaries to extract that information and pass it to my tool.

Somehow, this worked. I now have a bunch of library files that I can link into my own binaries to make it easier to figure out how various things on the Synology work. Now, could someone explain (a) why this information is present in two locations, and (b) why the build-time linker and run-time linker disagree on the canonical source of truth?

[1] "Shared object" is the source of the .so filename extension used in various Unix-style operating systems
[2] You'll note that "RTLD" is not an acryonym for "runtime dynamic linker", because reasons
[3] For environments using the GNU RTLD, at least - I have no idea whether this is the case in all ELF environments

comment count unavailable comments

1 January 2024

Russ Allbery: 2023 Book Reading in Review

In 2023, I finished and reviewed 53 books, continuing a trend of year-over-year increases and of reading the most books since 2012 (the last year I averaged five books a month). Reviewing continued to be uneven, with a significant slump in the summer and smaller slumps in February and November, and a big clump of reviews finished in October in addition to my normal year-end reading and reviewing vacation. The unevenness this year was mostly due to finishing books and not writing reviews immediately. Reviews are much harder to write when the finished books are piling up, so one goal for 2024 is to not let that happen again. I enter the new year with one book finished and not yet reviewed, after reading a book about every day and a half during my December vacation. I read two all-time favorite books this year. The first was Emily Tesh's debut novel Some Desperate Glory, which is one of the best space opera novels I have ever read. I cannot improve on Shelley Parker-Chan's blurb for this book: "Fierce and heartbreakingly humane, this book is for everyone who loved Ender's Game, but Ender's Game didn't love them back." This is not hard science fiction but it is fantastic character fiction. It was exactly what I needed in the middle of a year in which I was fighting a "burn everything down" mood. The second was Night Watch by Terry Pratchett, the 29th Discworld and 6th Watch novel. Throughout my Discworld read-through, Pratchett felt like he was on the cusp of a truly stand-out novel, one where all the pieces fit and the book becomes something more than the sum of its parts. This was that book. It's a book about ethics and revolutions and governance, but also about how your perception of yourself changes as you get older. It does all of the normal Pratchett things, just... better. While I would love to point new Discworld readers at it, I think you do have to read at least the Watch novels that came before it for it to carry its proper emotional heft. This was overall a solid year for fiction reading. I read another 15 novels I rated 8 out of 10, and 12 that I rated 7 out of 10. The largest contributor to that was my Discworld read-through, which was reliably entertaining throughout the year. The run of Discworld books between The Fifth Elephant (read late last year) and Wintersmith (my last of this year) was the best run of Discworld novels so far. One additional book I'll call out as particularly worth reading is Thud!, the Watch novel after Night Watch and another excellent entry. I read two stand-out non-fiction books this year. The first was Oliver Darkshire's delightful memoir about life as a rare book seller, Once Upon a Tome. One of the things I will miss about Twitter is the regularity with which I stumbled across fascinating people and then got to read their books. I'm off Twitter permanently now because the platform is designed to make me incoherently angry and I need less of that in my life, but it was very good at finding delightfully quirky books like this one. My other favorite non-fiction book of the year was Michael Lewis's Going Infinite, a profile of Sam Bankman-Fried. I'm still bemused at the negative reviews that this got from people who were upset that Lewis didn't turn the story into a black-and-white morality play. Bankman-Fried's actions were clearly criminal; that's not in dispute. Human motivations can be complex in ways that are irrelevant to the law, and I thought this attempt to understand that complexity by a top-notch storyteller was worthy of attention. Also worth a mention is Tony Judt's Postwar, the first book I reviewed in 2023. A sprawling history of post-World-War-II Europe will never have the sheer readability of shorter, punchier books, but this was the most informative book that I read in 2023. 2024 should see the conclusion of my Discworld read-through, after which I may return to re-reading Mercedes Lackey or David Eddings, both of which I paused to make time for Terry Pratchett. I also have another re-read similar to my Chronicles of Narnia reviews that I've been thinking about for a while. Perhaps I will start that next year; perhaps it will wait for 2025. Apart from that, my intention as always is to read steadily, write reviews as close to when I finished the book as possible, and make reading time for my huge existing backlog despite the constant allure of new releases. Here's to a new year full of more new-to-me books and occasional old favorites. The full analysis includes some additional personal reading statistics, probably only of interest to me.

25 December 2023

Sergio Talens-Oliag: GitLab CI/CD Tips: Automatic Versioning Using semantic-release

This post describes how I m using semantic-release on gitlab-ci to manage versioning automatically for different kinds of projects following a simple workflow (a develop branch where changes are added or merged to test new versions, a temporary release/#.#.# to generate the release candidate versions and a main branch where the final versions are published).

What is semantic-releaseIt is a Node.js application designed to manage project versioning information on Git Repositories using a Continuous integration system (in this post we will use gitlab-ci)

How does it workBy default semantic-release uses semver for versioning (release versions use the format MAJOR.MINOR.PATCH) and commit messages are parsed to determine the next version number to publish. If after analyzing the commits the version number has to be changed, the command updates the files we tell it to (i.e. the package.json file for nodejs projects and possibly a CHANGELOG.md file), creates a new commit with the changed files, creates a tag with the new version and pushes the changes to the repository. When running on a CI/CD system we usually generate the artifacts related to a release (a package, a container image, etc.) from the tag, as it includes the right version number and usually has passed all the required tests (it is a good idea to run the tests again in any case, as someone could create a tag manually or we could run extra jobs when building the final assets if they fail it is not a big issue anyway, numbers are cheap and infinite, so we can skip releases if needed).

Commit messages and versioningThe commit messages must follow a known format, the default module used to analyze them uses the angular git commit guidelines, but I prefer the conventional commits one, mainly because it s a lot easier to use when you want to update the MAJOR version. The commit message format used must be:
<type>(optional scope): <description>
[optional body]
[optional footer(s)]
The system supports three types of branches: release, maintenance and pre-release, but for now I m not using maintenance ones. The branches I use and their types are:
  • main as release branch (final versions are published from there)
  • develop as pre release branch (used to publish development and testing versions with the format #.#.#-SNAPSHOT.#)
  • release/#.#.# as pre release branches (they are created from develop to publish release candidate versions with the format #.#.#-rc.# and once they are merged with main they are deleted)
On the release branch (main) the version number is updated as follows:
  1. The MAJOR number is incremented if a commit with a BREAKING CHANGE: footer or an exclamation (!) after the type/scope is found in the list of commits found since the last version change (it looks for tags on the same branch).
  2. The MINOR number is incremented if the MAJOR number is not going to be changed and there is a commit with type feat in the commits found since the last version change.
  3. The PATCH number is incremented if neither the MAJOR nor the MINOR numbers are going to be changed and there is a commit with type fix in the the commits found since the last version change.
On the pre release branches (develop and release/#.#.#) the version and pre release numbers are always calculated from the last published version available on the branch (i. e. if we published version 1.3.2 on main we need to have the commit with that tag on the develop or release/#.#.# branch to get right what will be the next version). The version number is updated as follows:
  1. The MAJOR number is incremented if a commit with a BREAKING CHANGE: footer or an exclamation (!) after the type/scope is found in the list of commits found since the last released version.In our example it was 1.3.2 and the version is updated to 2.0.0-SNAPSHOT.1 or 2.0.0-rc.1 depending on the branch.
  2. The MINOR number is incremented if the MAJOR number is not going to be changed and there is a commit with type feat in the commits found since the last released version.In our example the release was 1.3.2 and the version is updated to 1.4.0-SNAPSHOT.1 or 1.4.0-rc.1 depending on the branch.
  3. The PATCH number is incremented if neither the MAJOR nor the MINOR numbers are going to be changed and there is a commit with type fix in the the commits found since the last version change.In our example the release was 1.3.2 and the version is updated to 1.3.3-SNAPSHOT.1 or 1.3.3-rc.1 depending on the branch.
  4. The pre release number is incremented if the MAJOR, MINOR and PATCH numbers are not going to be changed but there is a commit that would otherwise update the version (i.e. a fix on 1.3.3-SNAPSHOT.1 will set the version to 1.3.3-SNAPSHOT.2, a fix or feat on 1.4.0-rc.1 will set the version to 1.4.0-rc.2 an so on).

How do we manage its configurationAlthough the system is designed to work with nodejs projects, it can be used with multiple programming languages and project types. For nodejs projects the usual place to put the configuration is the project s package.json, but I prefer to use the .releaserc file instead. As I use a common set of CI templates, instead of using a .releaserc on each project I generate it on the fly on the jobs that need it, replacing values related to the project type and the current branch on a template using the tmpl command (lately I use a branch of my own fork while I wait for some feedback from upstream, as you will see on the Dockerfile).

Container used to run itAs we run the command on a gitlab-ci job we use the image built from the following Dockerfile:
Dockerfile
# Semantic release image
FROM golang:alpine AS tmpl-builder
#RUN go install github.com/krakozaure/tmpl@v0.4.0
RUN go install github.com/sto/tmpl@v0.4.0-sto.2
FROM node:lts-alpine
COPY --from=tmpl-builder /go/bin/tmpl /usr/local/bin/tmpl
RUN apk update &&\
  apk upgrade &&\
  apk add curl git jq openssh-keygen yq zip &&\
  npm install --location=global\
    conventional-changelog-conventionalcommits@6.1.0\
    @qiwi/multi-semantic-release@7.0.0\
    semantic-release@21.0.7\
    @semantic-release/changelog@6.0.3\
    semantic-release-export-data@1.0.1\
    @semantic-release/git@10.0.1\
    @semantic-release/gitlab@9.5.1\
    @semantic-release/release-notes-generator@11.0.4\
    semantic-release-replace-plugin@1.2.7\
    semver@7.5.4\
  &&\
  rm -rf /var/cache/apk/*
CMD ["/bin/sh"]

How and when is it executedThe job that runs semantic-release is executed when new commits are added to the develop, release/#.#.# or main branches (basically when something is merged or pushed) and after all tests have passed (we don t want to create a new version that does not compile or passes at least the unit tests). The job is something like the following:
semantic_release:
  image: $SEMANTIC_RELEASE_IMAGE
  rules:
    - if: '$CI_COMMIT_BRANCH =~ /^(develop main release\/\d+.\d+.\d+)$/'
      when: always
  stage: release
  before_script:
    - echo "Loading scripts.sh"
    - . $ASSETS_DIR/scripts.sh
  script:
    - sr_gen_releaserc_json
    - git_push_setup
    - semantic-release
Where the SEMANTIC_RELEASE_IMAGE variable contains the URI of the image built using the Dockerfile above and the sr_gen_releaserc_json and git_push_setup are functions defined on the $ASSETS_DIR/scripts.sh file:
  • The sr_gen_releaserc_json function generates the .releaserc.json file using the tmpl command.
  • The git_push_setup function configures git to allow pushing changes to the repository with the semantic-release command, optionally signing them with a SSH key.

The sr_gen_releaserc_json functionThe code for the sr_gen_releaserc_json function is the following:
sr_gen_releaserc_json()
 
  # Use nodejs as default project_type
  project_type="$ PROJECT_TYPE:-nodejs "
  # REGEX to match the rc_branch name
  rc_branch_regex='^release\/[0-9]\+\.[0-9]\+\.[0-9]\+$'
  # PATHS on the local ASSETS_DIR
  assets_dir="$ CI_PROJECT_DIR /$ ASSETS_DIR "
  sr_local_plugin="$ assets_dir /local-plugin.cjs"
  releaserc_tmpl="$ assets_dir /releaserc.json.tmpl"
  pipeline_runtime_values_yaml="/tmp/releaserc_values.yaml"
  pipeline_values_yaml="$ assets_dir /values_$ project_type _project.yaml"
  # Destination PATH
  releaserc_json=".releaserc.json"
  # Create an empty pipeline_values_yaml if missing
  test -f "$pipeline_values_yaml"   : >"$pipeline_values_yaml"
  # Create the pipeline_runtime_values_yaml file
  echo "branch: $ CI_COMMIT_BRANCH " >"$pipeline_runtime_values_yaml"
  echo "gitlab_url: $ CI_SERVER_URL " >"$pipeline_runtime_values_yaml"
  # Add the rc_branch name if we are on an rc_branch
  if [ "$(echo "$CI_COMMIT_BRANCH"   sed -ne "/$rc_branch_regex/ p ")" ]; then
    echo "rc_branch: $ CI_COMMIT_BRANCH " >>"$pipeline_runtime_values_yaml"
  elif [ "$(echo "$CI_MERGE_REQUEST_SOURCE_BRANCH_NAME"  
      sed -ne "/$rc_branch_regex/ p ")" ]; then
    echo "rc_branch: $ CI_MERGE_REQUEST_SOURCE_BRANCH_NAME " \
      >>"$pipeline_runtime_values_yaml"
  fi
  echo "sr_local_plugin: $ sr_local_plugin " >>"$pipeline_runtime_values_yaml"
  # Create the releaserc_json file
  tmpl -f "$pipeline_runtime_values_yaml" -f "$pipeline_values_yaml" \
    "$releaserc_tmpl"   jq . >"$releaserc_json"
  # Remove the pipeline_runtime_values_yaml file
  rm -f "$pipeline_runtime_values_yaml"
  # Print the releaserc_json file
  print_file_collapsed "$releaserc_json"
  # --*-- BEG: NOTE --*--
  # Rename the package.json to ignore it when calling semantic release.
  # The idea is that the local-plugin renames it back on the first step of the
  # semantic-release process.
  # --*-- END: NOTE --*--
  if [ -f "package.json" ]; then
    echo "Renaming 'package.json' to 'package.json_disabled'"
    mv "package.json" "package.json_disabled"
  fi
 
Almost all the variables used on the function are defined by gitlab except the ASSETS_DIR and PROJECT_TYPE; in the complete pipelines the ASSETS_DIR is defined on a common file included by all the pipelines and the project type is defined on the .gitlab-ci.yml file of each project. If you review the code you will see that the file processed by the tmpl command is named releaserc.json.tmpl, its contents are shown here:
 
  "plugins": [
     - if .sr_local_plugin  
    "  .sr_local_plugin  ",
     - end  
    [
      "@semantic-release/commit-analyzer",
       
        "preset": "conventionalcommits",
        "releaseRules": [
            "breaking": true, "release": "major"  ,
            "revert": true, "release": "patch"  ,
            "type": "feat", "release": "minor"  ,
            "type": "fix", "release": "patch"  ,
            "type": "perf", "release": "patch"  
        ]
       
    ],
     - if .replacements  
    [
      "semantic-release-replace-plugin",
        "replacements":   .replacements   toJson    
    ],
     - end  
    "@semantic-release/release-notes-generator",
     - if eq .branch "main"  
    [
      "@semantic-release/changelog",
        "changelogFile": "CHANGELOG.md", "changelogTitle": "# Changelog"  
    ],
     - end  
    [
      "@semantic-release/git",
       
        "assets":   if .assets   .assets   toJson   else  []  end  ,
        "message": "ci(release): v$ nextRelease.version \n\n$ nextRelease.notes "
       
    ],
    [
      "@semantic-release/gitlab",
        "gitlabUrl": "  .gitlab_url  ", "successComment": false  
    ]
  ],
  "branches": [
      "name": "develop", "prerelease": "SNAPSHOT"  ,
     - if .rc_branch  
      "name": "  .rc_branch  ", "prerelease": "rc"  ,
     - end  
    "main"
  ]
 
The values used to process the template are defined on a file built on the fly (releaserc_values.yaml) that includes the following keys and values:
  • branch: the name of the current branch
  • gitlab_url: the URL of the gitlab server (the value is taken from the CI_SERVER_URL variable)
  • rc_branch: the name of the current rc branch; we only set the value if we are processing one because semantic-release only allows one branch to match the rc prefix and if we use a wildcard (i.e. release/*) but the users keep more than one release/#.#.# branch open at the same time the calls to semantic-release will fail for sure.
  • sr_local_plugin: the path to the local plugin we use (shown later)
The template also uses a values_$ project_type _project.yaml file that includes settings specific to the project type, the one for nodejs is as follows:
replacements:
  - files:
      - "package.json"
    from: "\"version\": \".*\""
    to: "\"version\": \"$ nextRelease.version \""
assets:
  - "CHANGELOG.md"
  - "package.json"
The replacements section is used to update the version field on the relevant files of the project (in our case the package.json file) and the assets section includes the files that will be committed to the repository when the release is published (looking at the template you can see that the CHANGELOG.md is only updated for the main branch, we do it this way because if we update the file on other branches it creates a merge nightmare and we are only interested on it for released versions anyway). The local plugin adds code to rename the package.json_disabled file to package.json if present and prints the last and next versions on the logs with a format that can be easily parsed using sed:
local-plugin.cjs
// Minimal plugin to:
// - rename the package.json_disabled file to package.json if present
// - log the semantic-release last & next versions
function verifyConditions(pluginConfig, context)  
  var fs = require('fs');
  if (fs.existsSync('package.json_disabled'))  
    fs.renameSync('package.json_disabled', 'package.json');
    context.logger.log( verifyConditions: renamed 'package.json_disabled' to 'package.json' );
   
 
function analyzeCommits(pluginConfig, context)  
  if (context.lastRelease && context.lastRelease.version)  
    context.logger.log( analyzeCommits: LAST_VERSION=$ context.lastRelease.version  );
   
 
function verifyRelease(pluginConfig, context)  
  if (context.nextRelease && context.nextRelease.version)  
    context.logger.log( verifyRelease: NEXT_VERSION=$ context.nextRelease.version  );
   
 
module.exports =  
  verifyConditions,
  analyzeCommits,
  verifyRelease
 

The git_push_setup functionThe code for the git_push_setup function is the following:
git_push_setup()
 
  # Update global credentials to allow git clone & push for all the group repos
  git config --global credential.helper store
  cat >"$HOME/.git-credentials" <<EOF
https://fake-user:$ GITLAB_REPOSITORY_TOKEN @gitlab.com
EOF
  # Define user name, mail and signing key for semantic-release
  user_name="$SR_USER_NAME"
  user_email="$SR_USER_EMAIL"
  ssh_signing_key="$SSH_SIGNING_KEY"
  # Export git user variables
  export GIT_AUTHOR_NAME="$user_name"
  export GIT_AUTHOR_EMAIL="$user_email"
  export GIT_COMMITTER_NAME="$user_name"
  export GIT_COMMITTER_EMAIL="$user_email"
  # Sign commits with ssh if there is a SSH_SIGNING_KEY variable
  if [ "$ssh_signing_key" ]; then
    echo "Configuring GIT to sign commits with SSH"
    ssh_keyfile="/tmp/.ssh-id"
    : >"$ssh_keyfile"
    chmod 0400 "$ssh_keyfile"
    echo "$ssh_signing_key"   tr -d '\r' >"$ssh_keyfile"
    git config gpg.format ssh
    git config user.signingkey "$ssh_keyfile"
    git config commit.gpgsign true
  fi
 
The function assumes that the GITLAB_REPOSITORY_TOKEN variable (set on the CI/CD variables section of the project or group we want) contains a token with read_repository and write_repository permissions on all the projects we are going to use this function. The SR_USER_NAME and SR_USER_EMAIL variables can be defined on a common file or the CI/CD variables section of the project or group we want to work with and the script assumes that the optional SSH_SIGNING_KEY is exported as a CI/CD default value of type variable (that is why the keyfile is created on the fly) and git is configured to use it if the variable is not empty.
Warning: Keep in mind that the variables GITLAB_REPOSITORY_TOKEN and SSH_SIGNING_KEY contain secrets, so probably is a good idea to make them protected (if you do that you have to make the develop, main and release/* branches protected too).
Warning: The semantic-release user has to be able to push to all the projects on those protected branches, it is a good idea to create a dedicated user and add it as a MAINTAINER for the projects we want (the MAINTAINERS need to be able to push to the branches), or, if you are using a Gitlab with a Premium license you can use the api to allow the semantic-release user to push to the protected branches without allowing it for any other user.

The semantic-release commandOnce we have the .releaserc file and the git configuration ready we run the semantic-release command. If the branch we are working with has one or more commits that will increment the version, the tool does the following (note that the steps are described are the ones executed if we use the configuration we have generated):
  1. It detects the commits that will increment the version and calculates the next version number.
  2. Generates the release notes for the version.
  3. Applies the replacements defined on the configuration (in our example updates the version field on the package.json file).
  4. Updates the CHANGELOG.md file adding the release notes if we are going to publish the file (when we are on the main branch).
  5. Creates a commit if all or some of the files listed on the assets key have changed and uses the commit message we have defined, replacing the variables for their current values.
  6. Creates a tag with the new version number and the release notes.
  7. As we are using the gitlab plugin after tagging it also creates a release on the project with the tag name and the release notes.

Notes about the git workflows and merges between branchesIt is very important to remember that semantic-release looks at the commits of a given branch when calculating the next version to publish, that has two important implications:
  1. On pre release branches we need to have the commit that includes the tag with the released version, if we don t have it the next version is not calculated correctly.
  2. It is a bad idea to squash commits when merging a branch to another one, if we do that we will lose the information semantic-release needs to calculate the next version and even if we use the right prefix for the squashed commit (fix, feat, ) we miss all the messages that would otherwise go to the CHANGELOG.md file.
To make sure that we have the right commits on the pre release branches we should merge the main branch changes into the develop one after each release tag is created; in my pipelines the fist job that processes a release tag creates a branch from the tag and an MR to merge it to develop. The important thing about that MR is that is must not be squashed, if we do that the tag commit will probably be lost, so we need to be careful. To merge the changes directly we can run the following code:
# Set the SR_TAG variable to the tag you want to process
SR_TAG="v1.3.2"
# Fetch all the changes
git fetch --all --prune
# Switch to the main branch
git switch main
# Pull all the changes
git pull
# Switch to the development branch
git switch develop
# Pull all the changes
git pull
# Create followup branch from tag
git switch -c "followup/$SR_TAG" "$SR_TAG"
# Change files manually & commit the changed files
git commit -a --untracked-files=no -m "ci(followup): $SR_TAG to develop"
# Switch to the development branch
git switch develop
# Merge the followup branch into the development one using the --no-ff option
git merge --no-ff "followup/$SR_TAG"
# Remove the followup branch
git branch -d "followup/$SR_TAG"
# Push the changes
git push
If we can t push directly to develop we can create a MR pushing the followup branch after committing the changes, but we have to make sure that we don t squash the commits when merging or it will not work as we want.

22 December 2023

Russ Allbery: Review: Wintersmith

Review: Wintersmith, by Terry Pratchett
Series: Discworld #35
Publisher: Clarion Books
Copyright: 2006
Printing: 2007
ISBN: 0-06-089033-9
Format: Mass market
Pages: 450
Wintersmith is the 35th Discworld novel and the 3rd Tiffany Aching novel. You could probably start here, since understanding the backstory isn't vital for following the plot, but I'm not sure why you would. Tiffany is now training with Miss Treason, a 113-year-old witch who is quite different in her approach from Miss Level, Tiffany's mentor in A Hat Full of Sky. Miss Level was the unassuming and constantly helpful glue that held the neighborhood together. Miss Treason is the judge; her neighbors are scared of her and proud of being scared of her, since that means they have a proper witch who can see into their heads and sort out their problems. On the surface, they're quite different; part of the story of this book is Tiffany learning to see the similarities. First, though, Miss Treason rushes Tiffany to a strange midnight Morris Dance, without any explanation. The Morris Dance usually celebrates the coming of spring and is at the center of a village party, so Tiffany is quite confused by seeing it danced on a dark and windy night in late autumn. But there is a hole in the dance where the Fool normally is, and Tiffany can't keep herself from joining it. This proves to be a mistake. That space was left for someone very different from Tiffany, and now she's entangled herself in deep magic that she doesn't understand. This is another Pratchett novel where the main storyline didn't do much for me. All the trouble stems from Miss Treason being maddeningly opaque, and while she did warn Tiffany, she did so in that way that guarantees a protagonist of a middle-grade novel will ignore. The Wintersmith is a boring, one-note quasi-villain, and the plot mainly revolves around elemental powers being dumber than a sack of hammers. The one thing I will say about the main plot is that the magic Tiffany danced into is entangled with courtship and romance, Tiffany turns thirteen over the course of this book, and yet this is not weird and uncomfortable reading the way it would be in the hands of many other authors. Pratchett has a keen eye for the age range that he's targeting. The first awareness that there is such a thing as romance that might be relevant to oneself pairs nicely with the Wintersmith's utter confusion at how Tiffany's intrusion unbalanced his dance. This is a very specific age and experience that I think a lot of authors would shy away from, particularly with a female protagonist, and I thought Pratchett handled it adroitly. I personally found the Wintersmith's awkward courting tedious and annoying, but that's more about me than about the book. As with A Hat Full of Sky, though, everything other than the main plot was great. It is becoming obvious how much Tiffany and Granny Weatherwax have in common, and that Granny Weatherwax recognizes this and is training Tiffany herself. This is high-quality coming-of-age material, not in the traditional fantasy sense of chosen ones and map explorations, but in the sense of slowly-developing empathy and understanding of people who think differently than you do. Tiffany, like Granny Weatherwax, has very little patience with nonsense, and her irritation with stupidity is one of her best characteristics. But she's learning how to blunt it long enough to pay attention, and to understand how people she doesn't like can still be the right people for specific situations. I particularly loved how Granny carries on with a feud at the same time that Tiffany is learning to let go of one. It's not a contradiction or hypocrisy; it's a sign that Tiffany is entitled to her judgments and feelings, but has to learn how to keep them in their place and not let them take over. One of the great things about the Tiffany Aching books is that the villages are also characters. We don't see that much of the individual people, but one of the things Tiffany is learning is how to see the interpersonal dynamics and patterns of village life. Somehow the feelings of irritation and exasperation fade once you understand people's motives and see more sides to their character. There is a lot more Nanny Ogg in this book than there has been in the last few, and that reminded me of how much I love her character. She has a completely different approach than Granny Weatherwax, but it's just as effective in different ways. She's also the perfect witch to have around when you've stumbled into a stylized love story that you don't want to be a part of, and yet find oddly fascinating. It says something about the skill of Pratchett's characterization that I could enjoy a book this much while having no interest in the main plot. The Witches have always been great characters, but somehow they're even better when seen through Tiffany's perspective. Good stuff; if you liked any of the other Tiffany Aching books, you will like this as well. Followed by Making Money in publication order. The next Tiffany Aching novel is I Shall Wear Midnight. Rating: 8 out of 10

19 December 2023

Fran ois Marier: Filtering your own spam using SpamAssassin

I know that people rave about GMail's spam filtering, but it didn't work for me: I was seeing too many false positives. I personally prefer to see some false negatives (i.e. letting some spam through), but to reduce false positives as much as possible (and ideally have a way to tune this). Here's the local SpamAssassin setup I have put together over many years. In addition to the parts I describe here, I also turn off greylisting on my email provider (KolabNow) because I don't want to have to wait for up to 10 minutes for a "2FA" email to go through. This setup assumes that you download all of your emails to your local machine. I use fetchmail for this, though similar tools should work too.

Three tiers of emails The main reason my setup works for me, despite my receiving hundreds of spam messages every day, is that I split incoming emails into three tiers via procmail:
  1. not spam: delivered to inbox
  2. likely spam: quarantined in a soft_spam/ folder
  3. definitely spam: silently deleted
I only ever have to review the likely spam tier for false positives, which is on the order of 10-30 spam emails a day. I never even see the the hundreds that are silently deleted due to a very high score. This is implemented based on a threshold in my .procmailrc:
# Use spamassassin to check for spam
:0fw: .spamassassin.lock
  /usr/bin/spamassassin
# Throw away messages with a score of > 12.0
:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*
/dev/null
:0:
* ^X-Spam-Status: Yes
$HOME/Mail/soft_spam/
# Deliver all other messages
:0:
$ DEFAULT 
I also use the following ~/.muttrc configuration to easily report false negatives/positives and examine my likely spam folder via a shortcut in mutt:
unignore X-Spam-Level
unignore X-Spam-Status
macro index S "c=soft_spam/\n" "Switch to soft_spam"
# Tell mutt about SpamAssassin headers so that I can sort by spam score
spam "X-Spam-Status: (Yes No), (hits score)=(-?[0-9]+\.[0-9])" "%3"
folder-hook =soft_spam 'push ol'
folder-hook =spam 'push ou'
# <Esc>d = de-register as non-spam, register as spam, move to spam folder.
macro index \ed "<enter-command>unset wait_key\n<pipe-entry>spamassassin -r\n<enter-command>set wait_key\n<save-message>=spam\n" "report the message as spam"
# <Esc>u = unregister as spam, register as non-spam, move to inbox folder.
macro index \eu "<enter-command>unset wait_key\n<pipe-entry>spamassassin -k\n<enter-command>set wait_key\n<save-message>=inbox\n" "correct the false positive (this is not spam)"

Custom SpamAssassin rules In addition to the default ruleset that comes with SpamAssassin, I've also accrued a number of custom rules over the years. The first set comes from the (now defunct) SpamAssassin Rules Emporium. The second set is the one that backs bugs.debian.org and lists.debian.org. Note this second one includes archived copies of some of the SARE rules and so I only use some of the rules in the common/ directory. Finally, I wrote a few custom rules of my own based on specific kinds of emails I have seen slip through the cracks. I haven't written any of those in a long time and I suspect some of my rules are now obsolete. You may want to do your own testing before you copy these outright. In addition to rules to match more spam, I've also written a ruleset to remove false positives in French emails coming from many of the above custom rules. I also wrote a rule to get a bonus to any email that comes with a patch:
describe FM_PATCH   Includes a patch
body FM_PATCH   /\bdiff -pruN\b/
score FM_PATCH  -1.0
since it's not very common in spam emails :)

SpamAssassin settings When it comes to my system-wide SpamAssassin configuration in /etc/spamassassin/, I enable the following plugins:
loadplugin Mail::SpamAssassin::Plugin::AntiVirus
loadplugin Mail::SpamAssassin::Plugin::AskDNS
loadplugin Mail::SpamAssassin::Plugin::ASN
loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
loadplugin Mail::SpamAssassin::Plugin::Bayes
loadplugin Mail::SpamAssassin::Plugin::BodyEval
loadplugin Mail::SpamAssassin::Plugin::Check
loadplugin Mail::SpamAssassin::Plugin::DKIM
loadplugin Mail::SpamAssassin::Plugin::DNSEval
loadplugin Mail::SpamAssassin::Plugin::FreeMail
loadplugin Mail::SpamAssassin::Plugin::FromNameSpoof
loadplugin Mail::SpamAssassin::Plugin::HashBL
loadplugin Mail::SpamAssassin::Plugin::HeaderEval
loadplugin Mail::SpamAssassin::Plugin::HTMLEval
loadplugin Mail::SpamAssassin::Plugin::HTTPSMismatch
loadplugin Mail::SpamAssassin::Plugin::ImageInfo
loadplugin Mail::SpamAssassin::Plugin::MIMEEval
loadplugin Mail::SpamAssassin::Plugin::MIMEHeader
loadplugin Mail::SpamAssassin::Plugin::OLEVBMacro
loadplugin Mail::SpamAssassin::Plugin::PDFInfo
loadplugin Mail::SpamAssassin::Plugin::Phishing
loadplugin Mail::SpamAssassin::Plugin::Pyzor
loadplugin Mail::SpamAssassin::Plugin::Razor2
loadplugin Mail::SpamAssassin::Plugin::RelayEval
loadplugin Mail::SpamAssassin::Plugin::ReplaceTags
loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
loadplugin Mail::SpamAssassin::Plugin::SpamCop
loadplugin Mail::SpamAssassin::Plugin::TextCat
loadplugin Mail::SpamAssassin::Plugin::TxRep
loadplugin Mail::SpamAssassin::Plugin::URIDetail
loadplugin Mail::SpamAssassin::Plugin::URIEval
loadplugin Mail::SpamAssassin::Plugin::VBounce
loadplugin Mail::SpamAssassin::Plugin::WelcomeListSubject
loadplugin Mail::SpamAssassin::Plugin::WLBLEval
Some of these require extra helper packages or Perl libraries to be installed. See the comments in the relevant *.pre files. My ~/.spamassassin/user_prefs file contains the following configuration:
required_hits   5
ok_locales en fr
# Bayes options
score BAYES_00 -4.0
score BAYES_40 -0.5
score BAYES_60 1.0
score BAYES_80 2.7
score BAYES_95 4.0
score BAYES_99 6.0
bayes_auto_learn 1
bayes_ignore_header X-Miltered
bayes_ignore_header X-MIME-Autoconverted
bayes_ignore_header X-Evolution
bayes_ignore_header X-Virus-Scanned
bayes_ignore_header X-Forwarded-For
bayes_ignore_header X-Forwarded-By
bayes_ignore_header X-Scanned-By
bayes_ignore_header X-Spam-Level
bayes_ignore_header X-Spam-Status
as well as manual score reductions due to false positives, and manual score increases to help push certain types of spam emails over the 12.0 definitely spam threshold. Finally, I have the FuzzyOCR package installed since it has occasionally flagged some spam that other tools had missed. It is a little resource intensive though and so you may want to avoid this one if you are filtering spam for other people. As always, feel free to leave a comment if you do something else that works well and that's not included in my setup. This is a work-in-progress.

Antoine Beaupr : (Re)introducing screentest

I have accidentally rewritten screentest, an old X11/GTK2 program that I was previously using to, well, test screens.

Screentest is dead It was removed from Debian in May 2023 but had already missed two releases (Debian 11 "bullseye" and 12 "bookworm") due to release critical bugs. The stated reason for removal was:
The package is orphaned and its upstream is no longer developed. It depends on gtk2, has a low popcon and no reverse dependencies.
So I had little hope to see this program back in Debian. The git repository shows little activity, the last being two years ago. Interestingly, I do not quite remember what it was testing, but I do remember it to find dead pixels, confirm native resolution, and various pixel-peeping. Here's a screenshot of one of the screentest screens: screentest screenshot showing a white-on-black checkered background, with some circles in the corners, shades of gray and colors in the middle Now, I think it's safe to assume this program is dead and buried, and anyways I'm running wayland now, surely there's something better? Well, no. Of course not. Someone would know about it and tell me before I go on a random coding spree in a fit of procrastination... riiight? At least, the Debconf video team didn't seem to know of any replacement. They actually suggested I just "invoke gstreamer directly" and "embrace the joy of shell scripting".

Screentest reborn So, I naively did exactly that and wrote a horrible shell script. Then I realized the next step was to write an command line parser and monitor geometry guessing, and thought "NOPE, THIS IS WHERE THE SHELL STOPS", and rewrote the whole thing in Python. Now, screentest lives as a ~400-line Python script, half of which is unit test data and command-line parsing.

Why screentest Some smarty pants is going to complain and ask why the heck one would need something like that (and, well, someone already did), so maybe I can lay down a list of use case:
  • testing color output, in broad terms (answering the question of "is it just me or this project really yellow?")
  • testing focus and keystone ("this looks blurry, can you find a nice sharp frame in that movie to adjust focus?")
  • test for native resolution and sharpness ("does this projector really support 4k for 30$? that sounds like bullcrap")
  • looking for dead pixels ("i have a new monitor, i hope it's intact")

What does screentest do? Screentest displays a series of "patterns" on screen. The list of patterns is actually hardcoded in the script, copy-pasted from this list from the videotestsrc gstreamer plugin, but you can pass any pattern supported by your gstreamer installation with --patterns. A list of patterns relevant to your installation is available with the gst-inspect-1.0 videotestsrc command. By default, screentest goes through all patterns. Each pattern runs indefinitely until the you close the window, then the next pattern starts. You can restrict to a subset of patterns, for example this would be a good test for dead pixels:
screentest --patterns black,white,red,green,blue
This would be a good sharpness test:
screentest --patterns pinwheel,spokes,checkers-1,checkers-2,checkers-4,checkers-8
A good generic test is the classic SMPTE color bars and is the first in the list, but you can run only that test with:
screentest --patterns smpte
(I will mention, by the way, that as a system administrator with decades of experience, it is nearly impossible to type SMPTE without first typing SMTP and re-typing it again a few times before I get it right. I fully expect this post to have numerous typos.)
Here's an example of the SMPTE pattern from Wikipedia: SMPTE color bars For multi-monitor setups, screentest also supports specifying which output to use as a native resolution, with --output. Failing that, it will try to look at the outputs and use the first it will find. If it fails to find anything, you can specify a resolution with --resolution WIDTHxHEIGHT. I have tried to make it go full screen by default, but stumbled a bug in Sway that crashes gst-launch. If your Wayland compositor supports it, you can possibly enable full screen with --sink waylandsink fullscreen=true. Otherwise it will create a new window that you will have to make fullscreen yourself. For completeness, there's also an --audio flag that will emit the classic "drone", a sine wave at 440Hz at 40% volume (the audiotestsrc gstreamer plugin. And there's a --overlay-name option to show the pattern name, in case you get lost and want to start with one of them again.

How this works Most of the work is done by gstreamer. The script merely generates a pipeline and calls gst-launch to show the output. That both limits what it can do but also makes it much easier to use than figuring out gst-launch. There might be some additional patterns that could be useful, but I think those are better left to gstreamer. I, for example, am somewhat nostalgic of the Philips circle pattern that used to play for TV stations that were off-air in my area. But that, in my opinion, would be better added to the gstreamer plugin than into a separate thing. The script shows which command is being ran, so it's a good introduction to gstreamer pipelines. Advanced users (and the video team) will possibly not need screentest and will design their own pipelines with their own tools. I've previously worked with ffmpeg pipelines (in another such procrastinated coding spree, video-proxy-magic), and I found gstreamer more intuitive, even though it might be slightly less powerful. In retrospect, I should probably have picked a new name, to avoid crashing the namespace already used by the project, which is now on GitHub. Who knows, it might come back to life after this blog post; it would not be the first time. For now, the project lives along side the rest of my scripts collection but if there's sufficient interest, I might move it to its own git repositories. Comments, feedback, contributions are as usual welcome. And naturally, if you know something better for this kind of stuff, I'm happy to learn more about your favorite tool! So now I have finally found something to test my projector, which will likely confirm what I've already known all along: that it's kind of a piece of crap and I need to get a proper one.

13 December 2023

Melissa Wen: 15 Tips for Debugging Issues in the AMD Display Kernel Driver

A self-help guide for examining and debugging the AMD display driver within the Linux kernel/DRM subsystem. It s based on my experience as an external developer working on the driver, and are shared with the goal of helping others navigate the driver code. Acknowledgments: These tips were gathered thanks to the countless help received from AMD developers during the driver development process. The list below was obtained by examining open source code, reviewing public documentation, playing with tools, asking in public forums and also with the help of my former GSoC mentor, Rodrigo Siqueira.

Pre-Debugging Steps: Before diving into an issue, it s crucial to perform two essential steps: 1) Check the latest changes: Ensure you re working with the latest AMD driver modifications located in the amd-staging-drm-next branch maintained by Alex Deucher. You may also find bug fixes for newer kernel versions on branches that have the name pattern drm-fixes-<date>. 2) Examine the issue tracker: Confirm that your issue isn t already documented and addressed in the AMD display driver issue tracker. If you find a similar issue, you can team up with others and speed up the debugging process.

Understanding the issue: Do you really need to change this? Where should you start looking for changes? 3) Is the issue in the AMD kernel driver or in the userspace?: Identifying the source of the issue is essential regardless of the GPU vendor. Sometimes this can be challenging so here are some helpful tips:
  • Record the screen: Capture the screen using a recording app while experiencing the issue. If the bug appears in the capture, it s likely a userspace issue, not the kernel display driver.
  • Analyze the dmesg log: Look for error messages related to the display driver in the dmesg log. If the error message appears before the message [drm] Display Core v... , it s not likely a display driver issue. If this message doesn t appear in your log, the display driver wasn t fully loaded and you will see a notification that something went wrong here.
4) AMD Display Manager vs. AMD Display Core: The AMD display driver consists of two components:
  • Display Manager (DM): This component interacts directly with the Linux DRM infrastructure. Occasionally, issues can arise from misinterpretations of DRM properties or features. If the issue doesn t occur on other platforms with the same AMD hardware - for example, only happens on Linux but not on Windows - it s more likely related to the AMD DM code.
  • Display Core (DC): This is the platform-agnostic part responsible for setting and programming hardware features. Modifications to the DC usually require validation on other platforms, like Windows, to avoid regressions.
5) Identify the DC HW family: Each AMD GPU has variations in its hardware architecture. Features and helpers differ between families, so determining the relevant code for your specific hardware is crucial.
  • Find GPU product information in Linux/AMD GPU documentation
  • Check the dmesg log for the Display Core version (since this commit in Linux kernel 6.3v). For example:
    • [drm] Display Core v3.2.241 initialized on DCN 2.1
    • [drm] Display Core v3.2.237 initialized on DCN 3.0.1

Investigating the relevant driver code: Keep from letting unrelated driver code to affect your investigation. 6) Narrow the code inspection down to one DC HW family: the relevant code resides in a directory named after the DC number. For example, the DCN 3.0.1 driver code is located at drivers/gpu/drm/amd/display/dc/dcn301. We all know that the AMD s shared code is huge and you can use these boundaries to rule out codes unrelated to your issue. 7) Newer families may inherit code from older ones: you can find dcn301 using code from dcn30, dcn20, dcn10 files. It s crucial to verify which hooks and helpers your driver utilizes to investigate the right portion. You can leverage ftrace for supplemental validation. To give an example, it was useful when I was updating DCN3 color mapping to correctly use their new post-blending color capabilities, such as: Additionally, you can use two different HW families to compare behaviours. If you see the issue in one but not in the other, you can compare the code and understand what has changed and if the implementation from a previous family doesn t fit well the new HW resources or design. You can also count on the help of the community on the Linux AMD issue tracker to validate your code on other hardware and/or systems. This approach helped me debug a 2-year-old issue where the cursor gamma adjustment was incorrect in DCN3 hardware, but working correctly for DCN2 family. I solved the issue in two steps, thanks for community feedback and validation: 8) Check the hardware capability screening in the driver: You can currently find a list of display hardware capabilities in the drivers/gpu/drm/amd/display/dc/dcn*/dcn*_resource.c file. More precisely in the dcn*_resource_construct() function. Using DCN301 for illustration, here is the list of its hardware caps:
	/*************************************************
	 *  Resource + asic cap harcoding                *
	 *************************************************/
	pool->base.underlay_pipe_index = NO_UNDERLAY_PIPE;
	pool->base.pipe_count = pool->base.res_cap->num_timing_generator;
	pool->base.mpcc_count = pool->base.res_cap->num_timing_generator;
	dc->caps.max_downscale_ratio = 600;
	dc->caps.i2c_speed_in_khz = 100;
	dc->caps.i2c_speed_in_khz_hdcp = 5; /*1.4 w/a enabled by default*/
	dc->caps.max_cursor_size = 256;
	dc->caps.min_horizontal_blanking_period = 80;
	dc->caps.dmdata_alloc_size = 2048;
	dc->caps.max_slave_planes = 2;
	dc->caps.max_slave_yuv_planes = 2;
	dc->caps.max_slave_rgb_planes = 2;
	dc->caps.is_apu = true;
	dc->caps.post_blend_color_processing = true;
	dc->caps.force_dp_tps4_for_cp2520 = true;
	dc->caps.extended_aux_timeout_support = true;
	dc->caps.dmcub_support = true;
	/* Color pipeline capabilities */
	dc->caps.color.dpp.dcn_arch = 1;
	dc->caps.color.dpp.input_lut_shared = 0;
	dc->caps.color.dpp.icsc = 1;
	dc->caps.color.dpp.dgam_ram = 0; // must use gamma_corr
	dc->caps.color.dpp.dgam_rom_caps.srgb = 1;
	dc->caps.color.dpp.dgam_rom_caps.bt2020 = 1;
	dc->caps.color.dpp.dgam_rom_caps.gamma2_2 = 1;
	dc->caps.color.dpp.dgam_rom_caps.pq = 1;
	dc->caps.color.dpp.dgam_rom_caps.hlg = 1;
	dc->caps.color.dpp.post_csc = 1;
	dc->caps.color.dpp.gamma_corr = 1;
	dc->caps.color.dpp.dgam_rom_for_yuv = 0;
	dc->caps.color.dpp.hw_3d_lut = 1;
	dc->caps.color.dpp.ogam_ram = 1;
	// no OGAM ROM on DCN301
	dc->caps.color.dpp.ogam_rom_caps.srgb = 0;
	dc->caps.color.dpp.ogam_rom_caps.bt2020 = 0;
	dc->caps.color.dpp.ogam_rom_caps.gamma2_2 = 0;
	dc->caps.color.dpp.ogam_rom_caps.pq = 0;
	dc->caps.color.dpp.ogam_rom_caps.hlg = 0;
	dc->caps.color.dpp.ocsc = 0;
	dc->caps.color.mpc.gamut_remap = 1;
	dc->caps.color.mpc.num_3dluts = pool->base.res_cap->num_mpc_3dlut; //2
	dc->caps.color.mpc.ogam_ram = 1;
	dc->caps.color.mpc.ogam_rom_caps.srgb = 0;
	dc->caps.color.mpc.ogam_rom_caps.bt2020 = 0;
	dc->caps.color.mpc.ogam_rom_caps.gamma2_2 = 0;
	dc->caps.color.mpc.ogam_rom_caps.pq = 0;
	dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
	dc->caps.color.mpc.ocsc = 1;
	dc->caps.dp_hdmi21_pcon_support = true;
	/* read VBIOS LTTPR caps */
	if (ctx->dc_bios->funcs->get_lttpr_caps)  
		enum bp_result bp_query_result;
		uint8_t is_vbios_lttpr_enable = 0;
		bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, &is_vbios_lttpr_enable);
		dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable;
	 
	if (ctx->dc_bios->funcs->get_lttpr_interop)  
		enum bp_result bp_query_result;
		uint8_t is_vbios_interop_enabled = 0;
		bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, &is_vbios_interop_enabled);
		dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) && !!is_vbios_interop_enabled;
	 
Keep in mind that the documentation of color capabilities are available at the Linux kernel Documentation.

Understanding the development history: What has brought us to the current state? 9) Pinpoint relevant commits: Use git log and git blame to identify commits targeting the code section you re interested in. 10) Track regressions: If you re examining the amd-staging-drm-next branch, check for regressions between DC release versions. These are defined by DC_VER in the drivers/gpu/drm/amd/display/dc/dc.h file. Alternatively, find a commit with this format drm/amd/display: 3.2.221 that determines a display release. It s useful for bisecting. This information helps you understand how outdated your branch is and identify potential regressions. You can consider each DC_VER takes around one week to be bumped. Finally, check testing log of each release in the report provided on the amd-gfx mailing list, such as this one Tested-by: Daniel Wheeler:

Reducing the inspection area: Focus on what really matters. 11) Identify involved HW blocks: This helps isolate the issue. You can find more information about DCN HW blocks in the DCN Overview documentation. In summary:
  • Plane issues are closer to HUBP and DPP.
  • Blending/Stream issues are closer to MPC, OPP and OPTC. They are related to DRM CRTC subjects.
This information was useful when debugging a hardware rotation issue where the cursor plane got clipped off in the middle of the screen. Finally, the issue was addressed by two patches: 12) Issues around bandwidth (glitches) and clocks: May be affected by calculations done in these HW blocks and HW specific values. The recalculation equations are found in the DML folder. DML stands for Display Mode Library. It s in charge of all required configuration parameters supported by the hardware for multiple scenarios. See more in the AMD DC Overview kernel docs. It s a math library that optimally configures hardware to find the best balance between power efficiency and performance in a given scenario. Finding some clk variables that affect device behavior may be a sign of it. It s hard for a external developer to debug this part, since it involves information from HW specs and firmware programming that we don t have access. The best option is to provide all relevant debugging information you have and ask AMD developers to check the values from your suspicions.
  • Do a trick: If you suspect the power setup is degrading performance, try setting the amount of power supplied to the GPU to the maximum and see if it affects the system behavior with this command: sudo bash -c "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level"
I learned it when debugging glitches with hardware cursor rotation on Steam Deck. My first attempt was changing the clock calculation. In the end, Rodrigo Siqueira proposed the right solution targeting bandwidth in two steps:

Checking implicit programming and hardware limitations: Bring implicit programming to the level of consciousness and recognize hardware limitations. 13) Implicit update types: Check if the selected type for atomic update may affect your issue. The update type depends on the mode settings, since programming some modes demands more time for hardware processing. More details in the source code:
/* Surface update type is used by dc_update_surfaces_and_stream
 * The update type is determined at the very beginning of the function based
 * on parameters passed in and decides how much programming (or updating) is
 * going to be done during the call.
 *
 * UPDATE_TYPE_FAST is used for really fast updates that do not require much
 * logical calculations or hardware register programming. This update MUST be
 * ISR safe on windows. Currently fast update will only be used to flip surface
 * address.
 *
 * UPDATE_TYPE_MED is used for slower updates which require significant hw
 * re-programming however do not affect bandwidth consumption or clock
 * requirements. At present, this is the level at which front end updates
 * that do not require us to run bw_calcs happen. These are in/out transfer func
 * updates, viewport offset changes, recout size changes and pixel
depth changes.
 * This update can be done at ISR, but we want to minimize how often
this happens.
 *
 * UPDATE_TYPE_FULL is slow. Really slow. This requires us to recalculate our
 * bandwidth and clocks, possibly rearrange some pipes and reprogram
anything front
 * end related. Any time viewport dimensions, recout dimensions,
scaling ratios or
 * gamma need to be adjusted or pipe needs to be turned on (or
disconnected) we do
 * a full update. This cannot be done at ISR level and should be a rare event.
 * Unless someone is stress testing mpo enter/exit, playing with
colour or adjusting
 * underscan we don't expect to see this call at all.
 */
enum surface_update_type  
UPDATE_TYPE_FAST, /* super fast, safe to execute in isr */
UPDATE_TYPE_MED,  /* ISR safe, most of programming needed, no bw/clk change*/
UPDATE_TYPE_FULL, /* may need to shuffle resources */
 ;

Using tools: Observe the current state, validate your findings, continue improvements. 14) Use AMD tools to check hardware state and driver programming: help on understanding your driver settings and checking the behavior when changing those settings.
  • DC Visual confirmation: Check multiple planes and pipe split policy.
  • DTN logs: Check display hardware state, including rotation, size, format, underflow, blocks in use, color block values, etc.
  • UMR: Check ASIC info, register values, KMS state - links and elements (framebuffers, planes, CRTCs, connectors). Source: UMR project documentation
15) Use generic DRM/KMS tools:
  • IGT test tools: Use generic KMS tests or develop your own to isolate the issue in the kernel space. Compare results across different GPU vendors to understand their implementations and find potential solutions. Here AMD also has specific IGT tests for its GPUs that is expect to work without failures on any AMD GPU. You can check results of HW-specific tests using different display hardware families or you can compare expected differences between the generic workflow and AMD workflow.
  • drm_info: This tool summarizes the current state of a display driver (capabilities, properties and formats) per element of the DRM/KMS workflow. Output can be helpful when reporting bugs.

Don t give up! Debugging issues in the AMD display driver can be challenging, but by following these tips and leveraging available resources, you can significantly improve your chances of success. Worth mentioning: This blog post builds upon my talk, I m not an AMD expert, but presented at the 2022 XDC. It shares guidelines that helped me debug AMD display issues as an external developer of the driver. Open Source Display Driver: The Linux kernel/AMD display driver is open source, allowing you to actively contribute by addressing issues listed in the official tracker. Tackling existing issues or resolving your own can be a rewarding learning experience and a valuable contribution to the community. Additionally, the tracker serves as a valuable resource for finding similar bugs, troubleshooting tips, and suggestions from AMD developers. Finally, it s a platform for seeking help when needed. Remember, contributing to the open source community through issue resolution and collaboration is mutually beneficial for everyone involved.

9 December 2023

Simon Josefsson: Classic McEliece goes to IETF and OpenSSH

My earlier work on Streamlined NTRU Prime has been progressing along. The IETF document on sntrup761 in SSH has passed several process points. GnuPG s libgcrypt has added support for sntrup761. The libssh support for sntrup761 is working, but the merge request is stuck mostly due to lack of time to debug why the regression test suite sporadically errors out in non-sntrup761 related parts with the patch. The foundation for lattice-based post-quantum algorithms has some uncertainty around it, and I have felt that there is more to the post-quantum story than adding sntrup761 to implementations. Classic McEliece has been mentioned to me a couple of times, and I took some time to learn it and did a cut n paste job of the proposed ISO standard and published draft-josefsson-mceliece in the IETF to make the algorithm easily available to the IETF community. A high-quality implementation of Classic McEliece has been published as libmceliece and I ve been supporting the work of Jan Moj to package libmceliece for Debian, alas it has been stuck in the ftp-master NEW queue for manual review for over two months. The pre-dependencies librandombytes and libcpucycles are available in Debian already. All that text writing and packaging work set the scene to write some code. When I added support for sntrup761 in libssh, I became familiar with the OpenSSH code base, so it was natural to return to OpenSSH to experiment with a new SSH KEX for Classic McEliece. DJB suggested to pick mceliece6688128 and combine it with the existing X25519+sntrup761 or with plain X25519. While a three-algorithm hybrid between X25519, sntrup761 and mceliece6688128 would be a simple drop-in for those that don t want to lose the benefits offered by sntrup761, I decided to start the journey on a pure combination of X25519 with mceliece6688128. The key combiner in sntrup761x25519 is a simple SHA512 call and the only good I can say about that is that it is simple to describe and implement, and doesn t raise too many questions since it is already deployed. After procrastinating coding for months, once I sat down to work it only took a couple of hours until I had a successful Classic McEliece SSH connection. I suppose my brain had sorted everything in background before I started. To reproduce it, please try the following in a Debian testing environment (I use podman to get a clean environment).
# podman run -it --rm debian:testing-slim
apt update
apt dist-upgrade -y
apt install -y wget python3 librandombytes-dev libcpucycles-dev gcc make git autoconf libz-dev libssl-dev
cd ~
wget -q -O- https://lib.mceliece.org/libmceliece-20230612.tar.gz   tar xfz -
cd libmceliece-20230612/
./configure
make install
ldconfig
cd ..
git clone https://gitlab.com/jas/openssh-portable
cd openssh-portable
git checkout jas/mceliece
autoreconf
./configure # verify 'libmceliece support: yes'
make # CC="cc -DDEBUG_KEX=1 -DDEBUG_KEXDH=1 -DDEBUG_KEXECDH=1"
You should now have a working SSH client and server that supports Classic McEliece! Verify support by running ./ssh -Q kex and it should mention mceliece6688128x25519-sha512@openssh.com. To have it print plenty of debug outputs, you may remove the # character on the final line, but don t use such a build in production. You can test it as follows:
./ssh-keygen -A # writes to /usr/local/etc/ssh_host_...
# setup public-key based login by running the following:
./ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
adduser --system sshd
mkdir /var/empty
while true; do $PWD/sshd -p 2222 -f /dev/null; done &
./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date
On the client you should see output like this:
OpenSSH_9.5p1, OpenSSL 3.0.11 19 Sep 2023
...
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: mceliece6688128x25519-sha512@openssh.com
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:YognhWY7+399J+/V8eAQWmM3UFDLT0dkmoj3pIJ0zXs
...
debug1: Host '[localhost]:2222' is known and matches the ED25519 host key.
debug1: Found key in /root/.ssh/known_hosts:1
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
...
debug1: Sending command: date
debug1: pledge: fork
debug1: permanently_set_uid: 0/0
Environment:
  USER=root
  LOGNAME=root
  HOME=/root
  PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
  MAIL=/var/mail/root
  SHELL=/bin/bash
  SSH_CLIENT=::1 46894 2222
  SSH_CONNECTION=::1 46894 ::1 2222
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
Sat Dec  9 22:22:40 UTC 2023
debug1: channel 0: free: client-session, nchannels 1
Transferred: sent 1048044, received 3500 bytes, in 0.0 seconds
Bytes per second: sent 23388935.4, received 78108.6
debug1: Exit status 0
Notice the kex: algorithm: mceliece6688128x25519-sha512@openssh.com output. How about network bandwidth usage? Below is a comparison of a complete SSH client connection such as the one above that log in and print date and logs out. Plain X25519 is around 7kb, X25519 with sntrup761 is around 9kb, and mceliece6688128 with X25519 is around 1MB. Yes, Classic McEliece has large keys, but for many environments, 1MB of data for the session establishment will barely be noticeable.
./ssh -v -p 2222 localhost -oKexAlgorithms=curve25519-sha256 date 2>&1   grep ^Transferred
Transferred: sent 3028, received 3612 bytes, in 0.0 seconds
./ssh -v -p 2222 localhost -oKexAlgorithms=sntrup761x25519-sha512@openssh.com date 2>&1   grep ^Transferred
Transferred: sent 4212, received 4596 bytes, in 0.0 seconds
./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date 2>&1   grep ^Transferred
Transferred: sent 1048044, received 3764 bytes, in 0.0 seconds
So how about session establishment time?
date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=curve25519-sha256 date > /dev/null 2>&1; i= expr $i + 1 ; done; date
Sat Dec  9 22:39:19 UTC 2023
Sat Dec  9 22:39:25 UTC 2023
# 6 seconds
date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=sntrup761x25519-sha512@openssh.com date > /dev/null 2>&1; i= expr $i + 1 ; done; date
Sat Dec  9 22:39:29 UTC 2023
Sat Dec  9 22:39:38 UTC 2023
# 9 seconds
date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date > /dev/null 2>&1; i= expr $i + 1 ; done; date
Sat Dec  9 22:39:55 UTC 2023
Sat Dec  9 22:40:07 UTC 2023
# 12 seconds
I never noticed adding sntrup761, so I m pretty sure I wouldn t notice this increase either. This is all running on my laptop that runs Trisquel so take it with a grain of salt but at least the magnitude is clear. Future work items include: Happy post-quantum SSH ing! Update: Changing the mceliece6688128_keypair call to mceliece6688128f_keypair (i.e., using the fully compatible f-variant) results in McEliece being just as fast as sntrup761 on my machine. Update 2023-12-26: An initial IETF document draft-josefsson-ssh-mceliece-00 published.

6 December 2023

Reproducible Builds: Reproducible Builds in November 2023

Welcome to the November 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. As a rather rapid recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries (more).

Reproducible Builds Summit 2023 Between October 31st and November 2nd, we held our seventh Reproducible Builds Summit in Hamburg, Germany! Amazingly, the agenda and all notes from all sessions are all online many thanks to everyone who wrote notes from the sessions. As a followup on one idea, started at the summit, Alexander Couzens and Holger Levsen started work on a cache (or tailored front-end) for the snapshot.debian.org service. The general idea is that, when rebuilding Debian, you do not actually need the whole ~140TB of data from snapshot.debian.org; rather, only a very small subset of the packages are ever used for for building. It turns out, for amd64, arm64, armhf, i386, ppc64el, riscv64 and s390 for Debian trixie, unstable and experimental, this is only around 500GB ie. less than 1%. Although the new service not yet ready for usage, it has already provided a promising outlook in this regard. More information is available on https://rebuilder-snapshot.debian.net and we hope that this service becomes usable in the coming weeks. The adjacent picture shows a sticky note authored by Jan-Benedict Glaw at the summit in Hamburg, confirming Holger Levsen s theory that rebuilding all Debian packages needs a very small subset of packages, the text states that 69,200 packages (in Debian sid) list 24,850 packages in their .buildinfo files, in 8,0200 variations. This little piece of paper was the beginning of rebuilder-snapshot and is a direct outcome of the summit! The Reproducible Builds team would like to thank our event sponsors who include Mullvad VPN, openSUSE, Debian, Software Freedom Conservancy, Allotropia and Aspiration Tech.

Beyond Trusting FOSS presentation at SeaGL On November 4th, Vagrant Cascadian presented Beyond Trusting FOSS at SeaGL in Seattle, WA in the United States. Founded in 2013, SeaGL is a free, grassroots technical summit dedicated to spreading awareness and knowledge about free source software, hardware and culture. The summary of Vagrant s talk mentions that it will:
[ ] introduce the concepts of Reproducible Builds, including best practices for developing and releasing software, the tools available to help diagnose issues, and touch on progress towards solving decades-old deeply pervasive fundamental security issues Learn how to verify and demonstrate trust, rather than simply hoping everything is OK!
Germane to the contents of the talk, the slides for Vagrant s talk can be built reproducibly, resulting in a PDF with a SHA1 of cfde2f8a0b7e6ec9b85377eeac0661d728b70f34 when built on Debian bookworm and c21fab273232c550ce822c4b0d9988e6c49aa2c3 on Debian sid at the time of writing.

Human Factors in Software Supply Chain Security Marcel Fourn , Dominik Wermke, Sascha Fahl and Yasemin Acar have published an article in a Special Issue of the IEEE s Security & Privacy magazine. Entitled A Viewpoint on Human Factors in Software Supply Chain Security: A Research Agenda, the paper justifies the need for reproducible builds to reach developers and end-users specifically, and furthermore points out some under-researched topics that we have seen mentioned in interviews. An author pre-print of the article is available in PDF form.

Community updates On our mailing list this month:

openSUSE updates Bernhard M. Wiedemann has created a wiki page outlining an proposal to create a general-purpose Linux distribution which consists of 100% bit-reproducible packages albeit minus the embedded signature within RPM files. It would be based on openSUSE Tumbleweed or, if available, its Slowroll-variant. In addition, Bernhard posted another monthly update for his work elsewhere in openSUSE.

Ubuntu Launchpad now supports .buildinfo files Back in 2017, Steve Langasek filed a bug against Ubuntu s Launchpad code hosting platform to report that .changes files (artifacts of building Ubuntu and Debian packages) reference .buildinfo files that aren t actually exposed by Launchpad itself. This was causing issues when attempting to process .changes files with tools such as Lintian. However, it was noticed last month that, in early August of this year, Simon Quigley had resolved this issue, and .buildinfo files are now available from the Launchpad system.

PHP reproducibility updates There have been two updates from the PHP programming language this month. Firstly, the widely-deployed PHPUnit framework for the PHP programming language have recently released version 10.5.0, which introduces the inclusion of a composer.lock file, ensuring total reproducibility of the shipped binary file. Further details and the discussion that went into their particular implementation can be found on the associated GitHub pull request. In addition, the presentation Leveraging Nix in the PHP ecosystem has been given in late October at the PHP International Conference in Munich by Pol Dellaiera. While the video replay is not yet available, the (reproducible) presentation slides and speaker notes are available.

diffoscope changes diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes, including:
  • Improving DOS/MBR extraction by adding support for 7z. [ ]
  • Adding a missing RequiredToolNotFound import. [ ]
  • As a UI/UX improvement, try and avoid printing an extended traceback if diffoscope runs out of memory. [ ]
  • Mark diffoscope as stable on PyPI.org. [ ]
  • Uploading version 252 to Debian unstable. [ ]

Website updates A huge number of notes were added to our website that were taken at our recent Reproducible Builds Summit held between October 31st and November 2nd in Hamburg, Germany. In particular, a big thanks to Arnout Engelen, Bernhard M. Wiedemann, Daan De Meyer, Evangelos Ribeiro Tzaras, Holger Levsen and Orhun Parmaks z. In addition to this, a number of other changes were made, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Track packages marked as Priority: important in a new package set. [ ][ ]
    • Stop scheduling packages that fail to build from source in bookworm [ ] and bullseye. [ ].
    • Add old releases dashboard link in web navigation. [ ]
    • Permit re-run of the pool_buildinfos script to be re-run for a specific year. [ ]
    • Grant jbglaw access to the osuosl4 node [ ][ ] along with lynxis [ ].
    • Increase RAM on the amd64 Ionos builders from 48 GiB to 64 GiB; thanks IONOS! [ ]
    • Move buster to archived suites. [ ][ ]
    • Reduce the number of arm64 architecture workers from 24 to 16 in order to improve stability [ ], reduce the workers for amd64 from 32 to 28 and, for i386, reduce from 12 down to 8 [ ].
    • Show the entire build history of each Debian package. [ ]
    • Stop scheduling already tested package/version combinations in Debian bookworm. [ ]
  • Snapshot service for rebuilders
    • Add an HTTP-based API endpoint. [ ][ ]
    • Add a Gunicorn instance to serve the HTTP API. [ ]
    • Add an NGINX config [ ][ ][ ][ ]
  • System-health:
    • Detect failures due to HTTP 503 Service Unavailable errors. [ ]
    • Detect failures to update package sets. [ ]
    • Detect unmet dependencies. (This usually occurs with builds of Debian live-build.) [ ]
  • Misc-related changes:
    • do install systemd-ommd on jenkins. [ ]
    • fix harmless typo in squid.conf for codethink04. [ ]
    • fixup: reproducible Debian: add gunicorn service to serve /api for rebuilder-snapshot.d.o. [ ]
    • Increase codethink04 s Squid cache_dir size setting to 16 GiB. [ ]
    • Don t install systemd-oomd as it unfortunately kills sshd [ ]
    • Use debootstrap from backports when commisioning nodes. [ ]
    • Add the live_build_debian_stretch_gnome, debsums-tests_buster and debsums-tests_buster jobs to the zombie list. [ ][ ]
    • Run jekyll build with the --watch argument when building the Reproducible Builds website. [ ]
    • Misc node maintenance. [ ][ ][ ]
Other changes were made as well, however, including Mattia Rizzolo fixing rc.local s Bash syntax so it can actually run [ ], commenting away some file cleanup code that is (potentially) deleting too much [ ] and fixing the html_brekages page for Debian package builds [ ]. Finally, diagnosed and submitted a patch to add a AddEncoding gzip .gz line to the tests.reproducible-builds.org Apache configuration so that Gzip files aren t re-compressed as Gzip which some clients can t deal with (as well as being a waste of time). [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

21 November 2023

Mike Hommey: How I (kind of) killed Mercurial at Mozilla

Did you hear the news? Firefox development is moving from Mercurial to Git. While the decision is far from being mine, and I was barely involved in the small incremental changes that ultimately led to this decision, I feel I have to take at least some responsibility. And if you are one of those who would rather use Mercurial than Git, you may direct all your ire at me. But let's take a step back and review the past 25 years leading to this decision. You'll forgive me for skipping some details and any possible inaccuracies. This is already a long post, while I could have been more thorough, even I think that would have been too much. This is also not an official Mozilla position, only my personal perception and recollection as someone who was involved at times, but mostly an observer from a distance. From CVS to DVCS From its release in 1998, the Mozilla source code was kept in a CVS repository. If you're too young to know what CVS is, let's just say it's an old school version control system, with its set of problems. Back then, it was mostly ubiquitous in the Open Source world, as far as I remember. In the early 2000s, the Subversion version control system gained some traction, solving some of the problems that came with CVS. Incidentally, Subversion was created by Jim Blandy, who now works at Mozilla on completely unrelated matters. In the same period, the Linux kernel development moved from CVS to Bitkeeper, which was more suitable to the distributed nature of the Linux community. BitKeeper had its own problem, though: it was the opposite of Open Source, but for most pragmatic people, it wasn't a real concern because free access was provided. Until it became a problem: someone at OSDL developed an alternative client to BitKeeper, and licenses of BitKeeper were rescinded for OSDL members, including Linus Torvalds (they were even prohibited from purchasing one). Following this fiasco, in April 2005, two weeks from each other, both Git and Mercurial were born. The former was created by Linus Torvalds himself, while the latter was developed by Olivia Mackall, who was a Linux kernel developer back then. And because they both came out of the same community for the same needs, and the same shared experience with BitKeeper, they both were similar distributed version control systems. Interestingly enough, several other DVCSes existed: In this landscape, the major difference Git was making at the time was that it was blazing fast. Almost incredibly so, at least on Linux systems. That was less true on other platforms (especially Windows). It was a game-changer for handling large codebases in a smooth manner. Anyways, two years later, in 2007, Mozilla decided to move its source code not to Bzr, not to Git, not to Subversion (which, yes, was a contender), but to Mercurial. The decision "process" was laid down in two rather colorful blog posts. My memory is a bit fuzzy, but I don't recall that it was a particularly controversial choice. All of those DVCSes were still young, and there was no definite "winner" yet (GitHub hadn't even been founded). It made the most sense for Mozilla back then, mainly because the Git experience on Windows still wasn't there, and that mattered a lot for Mozilla, with its diverse platform support. As a contributor, I didn't think much of it, although to be fair, at the time, I was mostly consuming the source tarballs. Personal preferences Digging through my archives, I've unearthed a forgotten chapter: I did end up setting up both a Mercurial and a Git mirror of the Firefox source repository on alioth.debian.org. Alioth.debian.org was a FusionForge-based collaboration system for Debian developers, similar to SourceForge. It was the ancestor of salsa.debian.org. I used those mirrors for the Debian packaging of Firefox (cough cough Iceweasel). The Git mirror was created with hg-fast-export, and the Mercurial mirror was only a necessary step in the process. By that time, I had converted my Subversion repositories to Git, and switched off SVK. Incidentally, I started contributing to Git around that time as well. I apparently did this not too long after Mozilla switched to Mercurial. As a Linux user, I think I just wanted the speed that Mercurial was not providing. Not that Mercurial was that slow, but the difference between a couple seconds and a couple hundred milliseconds was a significant enough difference in user experience for me to prefer Git (and Firefox was not the only thing I was using version control for) Other people had also similarly created their own mirror, or with other tools. But none of them were "compatible": their commit hashes were different. Hg-git, used by the latter, was putting extra information in commit messages that would make the conversion differ, and hg-fast-export would just not be consistent with itself! My mirror is long gone, and those have not been updated in more than a decade. I did end up using Mercurial, when I got commit access to the Firefox source repository in April 2010. I still kept using Git for my Debian activities, but I now was also using Mercurial to push to the Mozilla servers. I joined Mozilla as a contractor a few months after that, and kept using Mercurial for a while, but as a, by then, long time Git user, it never really clicked for me. It turns out, the sentiment was shared by several at Mozilla. Git incursion In the early 2010s, GitHub was becoming ubiquitous, and the Git mindshare was getting large. Multiple projects at Mozilla were already entirely hosted on GitHub. As for the Firefox source code base, Mozilla back then was kind of a Wild West, and engineers being engineers, multiple people had been using Git, with their own inconvenient workflows involving a local Mercurial clone. The most popular set of scripts was moz-git-tools, to incorporate changes in a local Git repository into the local Mercurial copy, to then send to Mozilla servers. In terms of the number of people doing that, though, I don't think it was a lot of people, probably a few handfuls. On my end, I was still keeping up with Mercurial. I think at that time several engineers had their own unofficial Git mirrors on GitHub, and later on Ehsan Akhgari provided another mirror, with a twist: it also contained the full CVS history, which the canonical Mercurial repository didn't have. This was particularly interesting for engineers who needed to do some code archeology and couldn't get past the 2007 cutoff of the Mercurial repository. I think that mirror ultimately became the official-looking, but really unofficial, mozilla-central repository on GitHub. On a side note, a Mercurial repository containing the CVS history was also later set up, but that didn't lead to something officially supported on the Mercurial side. Some time around 2011~2012, I started to more seriously consider using Git for work myself, but wasn't satisfied with the workflows others had set up for themselves. I really didn't like the idea of wasting extra disk space keeping a Mercurial clone around while using a Git mirror. I wrote a Python script that would use Mercurial as a library to access a remote repository and produce a git-fast-import stream. That would allow the creation of a git repository without a local Mercurial clone. It worked quite well, but it was not able to incrementally update. Other, more complete tools existed already, some of which I mentioned above. But as time was passing and the size and depth of the Mercurial repository was growing, these tools were showing their limits and were too slow for my taste, especially for the initial clone. Boot to Git In the same time frame, Mozilla ventured in the Mobile OS sphere with Boot to Gecko, later known as Firefox OS. What does that have to do with version control? The needs of third party collaborators in the mobile space led to the creation of what is now the gecko-dev repository on GitHub. As I remember it, it was challenging to create, but once it was there, Git users could just clone it and have a working, up-to-date local copy of the Firefox source code and its history... which they could already have, but this was the first officially supported way of doing so. Coincidentally, Ehsan's unofficial mirror was having trouble (to the point of GitHub closing the repository) and was ultimately shut down in December 2013. You'll often find comments on the interwebs about how GitHub has become unreliable since the Microsoft acquisition. I can't really comment on that, but if you think GitHub is unreliable now, rest assured that it was worse in its beginning. And its sustainability as a platform also wasn't a given, being a rather new player. So on top of having this official mirror on GitHub, Mozilla also ventured in setting up its own Git server for greater control and reliability. But the canonical repository was still the Mercurial one, and while Git users now had a supported mirror to pull from, they still had to somehow interact with Mercurial repositories, most notably for the Try server. Git slowly creeping in Firefox build tooling Still in the same time frame, tooling around building Firefox was improving drastically. For obvious reasons, when version control integration was needed in the tooling, Mercurial support was always a no-brainer. The first explicit acknowledgement of a Git repository for the Firefox source code, other than the addition of the .gitignore file, was bug 774109. It added a script to install the prerequisites to build Firefox on macOS (still called OSX back then), and that would print a message inviting people to obtain a copy of the source code with either Mercurial or Git. That was a precursor to current bootstrap.py, from September 2012. Following that, as far as I can tell, the first real incursion of Git in the Firefox source tree tooling happened in bug 965120. A few days earlier, bug 952379 had added a mach clang-format command that would apply clang-format-diff to the output from hg diff. Obviously, running hg diff on a Git working tree didn't work, and bug 965120 was filed, and support for Git was added there. That was in January 2014. A year later, when the initial implementation of mach artifact was added (which ultimately led to artifact builds), Git users were an immediate thought. But while they were considered, it was not to support them, but to avoid actively breaking their workflows. Git support for mach artifact was eventually added 14 months later, in March 2016. From gecko-dev to git-cinnabar Let's step back a little here, back to the end of 2014. My user experience with Mercurial had reached a level of dissatisfaction that was enough for me to decide to take that script from a couple years prior and make it work for incremental updates. That meant finding a way to store enough information locally to be able to reconstruct whatever the incremental updates would be relying on (guess why other tools hid a local Mercurial clone under hood). I got something working rather quickly, and after talking to a few people about this side project at the Mozilla Portland All Hands and seeing their excitement, I published a git-remote-hg initial prototype on the last day of the All Hands. Within weeks, the prototype gained the ability to directly push to Mercurial repositories, and a couple months later, was renamed to git-cinnabar. At that point, as a Git user, instead of cloning the gecko-dev repository from GitHub and switching to a local Mercurial repository whenever you needed to push to a Mercurial repository (i.e. the aforementioned Try server, or, at the time, for reviews), you could just clone and push directly from/to Mercurial, all within Git. And it was fast too. You could get a full clone of mozilla-central in less than half an hour, when at the time, other similar tools would take more than 10 hours (needless to say, it's even worse now). Another couple months later (we're now at the end of April 2015), git-cinnabar became able to start off a local clone of the gecko-dev repository, rather than clone from scratch, which could be time consuming. But because git-cinnabar and the tool that was updating gecko-dev weren't producing the same commits, this setup was cumbersome and not really recommended. For instance, if you pushed something to mozilla-central with git-cinnabar from a gecko-dev clone, it would come back with a different commit hash in gecko-dev, and you'd have to deal with the divergence. Eventually, in April 2020, the scripts updating gecko-dev were switched to git-cinnabar, making the use of gecko-dev alongside git-cinnabar a more viable option. Ironically(?), the switch occurred to ease collaboration with KaiOS (you know, the mobile OS born from the ashes of Firefox OS). Well, okay, in all honesty, when the need of syncing in both directions between Git and Mercurial (we only had ever synced from Mercurial to Git) came up, I nudged Mozilla in the direction of git-cinnabar, which, in my (biased but still honest) opinion, was the more reliable option for two-way synchronization (we did have regular conversion problems with hg-git, nothing of the sort has happened since the switch). One Firefox repository to rule them all For reasons I don't know, Mozilla decided to use separate Mercurial repositories as "branches". With the switch to the rapid release process in 2011, that meant one repository for nightly (mozilla-central), one for aurora, one for beta, and one for release. And with the addition of Extended Support Releases in 2012, we now add a new ESR repository every year. Boot to Gecko also had its own branches, and so did Fennec (Firefox for Mobile, before Android). There are a lot of them. And then there are also integration branches, where developer's work lands before being merged in mozilla-central (or backed out if it breaks things), always leaving mozilla-central in a (hopefully) good state. Only one of them remains in use today, though. I can only suppose that the way Mercurial branches work was not deemed practical. It is worth noting, though, that Mercurial branches are used in some cases, to branch off a dot-release when the next major release process has already started, so it's not a matter of not knowing the feature exists or some such. In 2016, Gregory Szorc set up a new repository that would contain them all (or at least most of them), which eventually became what is now the mozilla-unified repository. This would e.g. simplify switching between branches when necessary. 7 years later, for some reason, the other "branches" still exist, but most developers are expected to be using mozilla-unified. Mozilla's CI also switched to using mozilla-unified as base repository. Honestly, I'm not sure why the separate repositories are still the main entry point for pushes, rather than going directly to mozilla-unified, but it probably comes down to switching being work, and not being a top priority. Also, it probably doesn't help that working with multiple heads in Mercurial, even (especially?) with bookmarks, can be a source of confusion. To give an example, if you aren't careful, and do a plain clone of the mozilla-unified repository, you may not end up on the latest mozilla-central changeset, but rather, e.g. one from beta, or some other branch, depending which one was last updated. Hosting is simple, right? Put your repository on a server, install hgweb or gitweb, and that's it? Maybe that works for... Mercurial itself, but that repository "only" has slightly over 50k changesets and less than 4k files. Mozilla-central has more than an order of magnitude more changesets (close to 700k) and two orders of magnitude more files (more than 700k if you count the deleted or moved files, 350k if you count the currently existing ones). And remember, there are a lot of "duplicates" of this repository. And I didn't even mention user repositories and project branches. Sure, it's a self-inflicted pain, and you'd think it could probably(?) be mitigated with shared repositories. But consider the simple case of two repositories: mozilla-central and autoland. You make autoland use mozilla-central as a shared repository. Now, you push something new to autoland, it's stored in the autoland datastore. Eventually, you merge to mozilla-central. Congratulations, it's now in both datastores, and you'd need to clean-up autoland if you wanted to avoid the duplication. Now, you'd think mozilla-unified would solve these issues, and it would... to some extent. Because that wouldn't cover user repositories and project branches briefly mentioned above, which in GitHub parlance would be considered as Forks. So you'd want a mega global datastore shared by all repositories, and repositories would need to only expose what they really contain. Does Mercurial support that? I don't think so (okay, I'll give you that: even if it doesn't, it could, but that's extra work). And since we're talking about a transition to Git, does Git support that? You may have read about how you can link to a commit from a fork and make-pretend that it comes from the main repository on GitHub? At least, it shows a warning, now. That's essentially the architectural reason why. So the actual answer is that Git doesn't support it out of the box, but GitHub has some backend magic to handle it somehow (and hopefully, other things like Gitea, Girocco, Gitlab, etc. have something similar). Now, to come back to the size of the repository. A repository is not a static file. It's a server with which you negotiate what you have against what it has that you want. Then the server bundles what you asked for based on what you said you have. Or in the opposite direction, you negotiate what you have that it doesn't, you send it, and the server incorporates what you sent it. Fortunately the latter is less frequent and requires authentication. But the former is more frequent and CPU intensive. Especially when pulling a large number of changesets, which, incidentally, cloning is. "But there is a solution for clones" you might say, which is true. That's clonebundles, which offload the CPU intensive part of cloning to a single job scheduled regularly. Guess who implemented it? Mozilla. But that only covers the cloning part. We actually had laid the ground to support offloading large incremental updates and split clones, but that never materialized. Even with all that, that still leaves you with a server that can display file contents, diffs, blames, provide zip archives of a revision, and more, all of which are CPU intensive in their own way. And these endpoints are regularly abused, and cause extra load to your servers, yes plural, because of course a single server won't handle the load for the number of users of your big repositories. And because your endpoints are abused, you have to close some of them. And I'm not mentioning the Try repository with its tens of thousands of heads, which brings its own sets of problems (and it would have even more heads if we didn't fake-merge them once in a while). Of course, all the above applies to Git (and it only gained support for something akin to clonebundles last year). So, when the Firefox OS project was stopped, there wasn't much motivation to continue supporting our own Git server, Mercurial still being the official point of entry, and git.mozilla.org was shut down in 2016. The growing difficulty of maintaining the status quo Slowly, but steadily in more recent years, as new tooling was added that needed some input from the source code manager, support for Git was more and more consistently added. But at the same time, as people left for other endeavors and weren't necessarily replaced, or more recently with layoffs, resources allocated to such tooling have been spread thin. Meanwhile, the repository growth didn't take a break, and the Try repository was becoming an increasing pain, with push times quite often exceeding 10 minutes. The ongoing work to move Try pushes to Lando will hide the problem under the rug, but the underlying problem will still exist (although the last version of Mercurial seems to have improved things). On the flip side, more and more people have been relying on Git for Firefox development, to my own surprise, as I didn't really push for that to happen. It just happened organically, by ways of git-cinnabar existing, providing a compelling experience to those who prefer Git, and, I guess, word of mouth. I was genuinely surprised when I recently heard the use of Git among moz-phab users had surpassed a third. I did, however, occasionally orient people who struggled with Mercurial and said they were more familiar with Git, towards git-cinnabar. I suspect there's a somewhat large number of people who never realized Git was a viable option. But that, on its own, can come with its own challenges: if you use git-cinnabar without being backed by gecko-dev, you'll have a hard time sharing your branches on GitHub, because you can't push to a fork of gecko-dev without pushing your entire local repository, as they have different commit histories. And switching to gecko-dev when you weren't already using it requires some extra work to rebase all your local branches from the old commit history to the new one. Clone times with git-cinnabar have also started to go a little out of hand in the past few years, but this was mitigated in a similar manner as with the Mercurial cloning problem: with static files that are refreshed regularly. Ironically, that made cloning with git-cinnabar faster than cloning with Mercurial. But generating those static files is increasingly time-consuming. As of writing, generating those for mozilla-unified takes close to 7 hours. I was predicting clone times over 10 hours "in 5 years" in a post from 4 years ago, I wasn't too far off. With exponential growth, it could still happen, although to be fair, CPUs have improved since. I will explore the performance aspect in a subsequent blog post, alongside the upcoming release of git-cinnabar 0.7.0-b1. I don't even want to check how long it now takes with hg-git or git-remote-hg (they were already taking more than a day when git-cinnabar was taking a couple hours). I suppose it's about time that I clarify that git-cinnabar has always been a side-project. It hasn't been part of my duties at Mozilla, and the extent to which Mozilla supports git-cinnabar is in the form of taskcluster workers on the community instance for both git-cinnabar CI and generating those clone bundles. Consequently, that makes the above git-cinnabar specific issues a Me problem, rather than a Mozilla problem. Taking the leap I can't talk for the people who made the proposal to move to Git, nor for the people who put a green light on it. But I can at least give my perspective. Developers have regularly asked why Mozilla was still using Mercurial, but I think it was the first time that a formal proposal was laid out. And it came from the Engineering Workflow team, responsible for issue tracking, code reviews, source control, build and more. It's easy to say "Mozilla should have chosen Git in the first place", but back in 2007, GitHub wasn't there, Bitbucket wasn't there, and all the available options were rather new (especially compared to the then 21 years-old CVS). I think Mozilla made the right choice, all things considered. Had they waited a couple years, the story might have been different. You might say that Mozilla stayed with Mercurial for so long because of the sunk cost fallacy. I don't think that's true either. But after the biggest Mercurial repository hosting service turned off Mercurial support, and the main contributor to Mercurial going their own way, it's hard to ignore that the landscape has evolved. And the problems that we regularly encounter with the Mercurial servers are not going to get any better as the repository continues to grow. As far as I know, all the Mercurial repositories bigger than Mozilla's are... not using Mercurial. Google has its own closed-source server, and Facebook has another of its own, and it's not really public either. With resources spread thin, I don't expect Mozilla to be able to continue supporting a Mercurial server indefinitely (although I guess Octobus could be contracted to give a hand, but is that sustainable?). Mozilla, being a champion of Open Source, also doesn't live in a silo. At some point, you have to meet your contributors where they are. And the Open Source world is now majoritarily using Git. I'm sure the vast majority of new hires at Mozilla in the past, say, 5 years, know Git and have had to learn Mercurial (although they arguably didn't need to). Even within Mozilla, with thousands(!) of repositories on GitHub, Firefox is now actually the exception rather than the norm. I should even actually say Desktop Firefox, because even Mobile Firefox lives on GitHub (although Fenix is moving back in together with Desktop Firefox, and the timing is such that that will probably happen before Firefox moves to Git). Heck, even Microsoft moved to Git! With a significant developer base already using Git thanks to git-cinnabar, and all the constraints and problems I mentioned previously, it actually seems natural that a transition (finally) happens. However, had git-cinnabar or something similarly viable not existed, I don't think Mozilla would be in a position to take this decision. On one hand, it probably wouldn't be in the current situation of having to support both Git and Mercurial in the tooling around Firefox, nor the resource constraints related to that. But on the other hand, it would be farther from supporting Git and being able to make the switch in order to address all the other problems. But... GitHub? I hope I made a compelling case that hosting is not as simple as it can seem, at the scale of the Firefox repository. It's also not Mozilla's main focus. Mozilla has enough on its plate with the migration of existing infrastructure that does rely on Mercurial to understandably not want to figure out the hosting part, especially with limited resources, and with the mixed experience hosting both Mercurial and git has been so far. After all, GitHub couldn't even display things like the contributors' graph on gecko-dev until recently, and hosting is literally their job! They still drop the ball on large blames (thankfully we have searchfox for those). Where does that leave us? Gitlab? For those criticizing GitHub for being proprietary, that's probably not open enough. Cloud Source Repositories? "But GitHub is Microsoft" is a complaint I've read a lot after the announcement. Do you think Google hosting would have appealed to these people? Bitbucket? I'm kind of surprised it wasn't in the list of providers that were considered, but I'm also kind of glad it wasn't (and I'll leave it at that). I think the only relatively big hosting provider that could have made the people criticizing the choice of GitHub happy is Codeberg, but I hadn't even heard of it before it was mentioned in response to Mozilla's announcement. But really, with literal thousands of Mozilla repositories already on GitHub, with literal tens of millions repositories on the platform overall, the pragmatic in me can't deny that it's an attractive option (and I can't stress enough that I wasn't remotely close to the room where the discussion about what choice to make happened). "But it's a slippery slope". I can see that being a real concern. LLVM also moved its repository to GitHub (from a (I think) self-hosted Subversion server), and ended up moving off Bugzilla and Phabricator to GitHub issues and PRs four years later. As an occasional contributor to LLVM, I hate this move. I hate the GitHub review UI with a passion. At least, right now, GitHub PRs are not a viable option for Mozilla, for their lack of support for security related PRs, and the more general shortcomings in the review UI. That doesn't mean things won't change in the future, but let's not get too far ahead of ourselves. The move to Git has just been announced, and the migration has not even begun yet. Just because Mozilla is moving the Firefox repository to GitHub doesn't mean it's locked in forever or that all the eggs are going to be thrown into one basket. If bridges need to be crossed in the future, we'll see then. So, what's next? The official announcement said we're not expecting the migration to really begin until six months from now. I'll swim against the current here, and say this: the earlier you can switch to git, the earlier you'll find out what works and what doesn't work for you, whether you already know Git or not. While there is not one unique workflow, here's what I would recommend anyone who wants to take the leap off Mercurial right now: As there is no one-size-fits-all workflow, I won't tell you how to organize yourself from there. I'll just say this: if you know the Mercurial sha1s of your previous local work, you can create branches for them with:
$ git branch <branch_name> $(git cinnabar hg2git <hg_sha1>)
At this point, you should have everything available on the Git side, and you can remove the .hg directory. Or move it into some empty directory somewhere else, just in case. But don't leave it here, it will only confuse the tooling. Artifact builds WILL be confused, though, and you'll have to ./mach configure before being able to do anything. You may also hit bug 1865299 if your working tree is older than this post. If you have any problem or question, you can ping me on #git-cinnabar or #git on Matrix. I'll put the instructions above somewhere on wiki.mozilla.org, and we can collaboratively iterate on them. Now, what the announcement didn't say is that the Git repository WILL NOT be gecko-dev, doesn't exist yet, and WON'T BE COMPATIBLE (trust me, it'll be for the better). Why did I make you do all the above, you ask? Because that won't be a problem. I'll have you covered, I promise. The upcoming release of git-cinnabar 0.7.0-b1 will have a way to smoothly switch between gecko-dev and the future repository (incidentally, that will also allow to switch from a pure git-cinnabar clone to a gecko-dev one, for the git-cinnabar users who have kept reading this far). What about git-cinnabar? With Mercurial going the way of the dodo at Mozilla, my own need for git-cinnabar will vanish. Legitimately, this begs the question whether it will still be maintained. I can't answer for sure. I don't have a crystal ball. However, the needs of the transition itself will motivate me to finish some long-standing things (like finalizing the support for pushing merges, which is currently behind an experimental flag) or implement some missing features (support for creating Mercurial branches). Git-cinnabar started as a Python script, it grew a sidekick implemented in C, which then incorporated some Rust, which then cannibalized the Python script and took its place. It is now close to 90% Rust, and 10% C (if you don't count the code from Git that is statically linked to it), and has sort of become my Rust playground (it's also, I must admit, a mess, because of its history, but it's getting better). So the day to day use with Mercurial is not my sole motivation to keep developing it. If it were, it would stay stagnant, because all the features I need are there, and the speed is not all that bad, although I know it could be better. Arguably, though, git-cinnabar has been relatively stagnant feature-wise, because all the features I need are there. So, no, I don't expect git-cinnabar to die along Mercurial use at Mozilla, but I can't really promise anything either. Final words That was a long post. But there was a lot of ground to cover. And I still skipped over a bunch of things. I hope I didn't bore you to death. If I did and you're still reading... what's wrong with you? ;) So this is the end of Mercurial at Mozilla. So long, and thanks for all the fish. But this is also the beginning of a transition that is not easy, and that will not be without hiccups, I'm sure. So fasten your seatbelts (plural), and welcome the change. To circle back to the clickbait title, did I really kill Mercurial at Mozilla? Of course not. But it's like I stumbled upon a few sparks and tossed a can of gasoline on them. I didn't start the fire, but I sure made it into a proper bonfire... and now it has turned into a wildfire. And who knows? 15 years from now, someone else might be looking back at how Mozilla picked Git at the wrong time, and that, had we waited a little longer, we would have picked some yet to come new horse. But hey, that's the tech cycle for you.

7 November 2023

Melissa Wen: AMD Driver-specific Properties for Color Management on Linux (Part 2)

TL;DR: This blog post explores the color capabilities of AMD hardware and how they are exposed to userspace through driver-specific properties. It discusses the different color blocks in the AMD Display Core Next (DCN) pipeline and their capabilities, such as predefined transfer functions, 1D and 3D lookup tables (LUTs), and color transformation matrices (CTMs). It also highlights the differences in AMD HW blocks for pre and post-blending adjustments, and how these differences are reflected in the available driver-specific properties. Overall, this blog post provides a comprehensive overview of the color capabilities of AMD hardware and how they can be controlled by userspace applications through driver-specific properties. This information is valuable for anyone who wants to develop applications that can take advantage of the AMD color management pipeline. Get a closer look at each hardware block s capabilities, unlock a wealth of knowledge about AMD display hardware, and enhance your understanding of graphics and visual computing. Stay tuned for future developments as we embark on a quest for GPU color capabilities in the ever-evolving realm of rainbow treasures.
Operating Systems can use the power of GPUs to ensure consistent color reproduction across graphics devices. We can use GPU-accelerated color management to manage the diversity of color profiles, do color transformations to convert between High-Dynamic-Range (HDR) and Standard-Dynamic-Range (SDR) content and color enhacements for wide color gamut (WCG). However, to make use of GPU display capabilities, we need an interface between userspace and the kernel display drivers that is currently absent in the Linux/DRM KMS API. In the previous blog post I presented how we are expanding the Linux/DRM color management API to expose specific properties of AMD hardware. Now, I ll guide you to the color features for the Linux/AMD display driver. We embark on a journey through DRM/KMS, AMD Display Manager, and AMD Display Core and delve into the color blocks to uncover the secrets of color manipulation within AMD hardware. Here we ll talk less about the color tools and more about where to find them in the hardware. We resort to driver-specific properties to reach AMD hardware blocks with color capabilities. These blocks display features like predefined transfer functions, color transformation matrices, and 1-dimensional (1D LUT) and 3-dimensional lookup tables (3D LUT). Here, we will understand how these color features are strategically placed into color blocks both before and after blending in Display Pipe and Plane (DPP) and Multiple Pipe/Plane Combined (MPC) blocks. That said, welcome back to the second part of our thrilling journey through AMD s color management realm!

AMD Display Driver in the Linux/DRM Subsystem: The Journey In my 2022 XDC talk I m not an AMD expert, but , I briefly explained the organizational structure of the Linux/AMD display driver where the driver code is bifurcated into a Linux-specific section and a shared-code portion. To reveal AMD s color secrets through the Linux kernel DRM API, our journey led us through these layers of the Linux/AMD display driver s software stack. It includes traversing the DRM/KMS framework, the AMD Display Manager (DM), and the AMD Display Core (DC) [1]. The DRM/KMS framework provides the atomic API for color management through KMS properties represented by struct drm_property. We extended the color management interface exposed to userspace by leveraging existing resources and connecting them with driver-specific functions for managing modeset properties. On the AMD DC layer, the interface with hardware color blocks is established. The AMD DC layer contains OS-agnostic components that are shared across different platforms, making it an invaluable resource. This layer already implements hardware programming and resource management, simplifying the external developer s task. While examining the DC code, we gain insights into the color pipeline and capabilities, even without direct access to specifications. Additionally, AMD developers provide essential support by answering queries and reviewing our work upstream. The primary challenge involved identifying and understanding relevant AMD DC code to configure each color block in the color pipeline. However, the ultimate goal was to bridge the DC color capabilities with the DRM API. For this, we changed the AMD DM, the OS-dependent layer connecting the DC interface to the DRM/KMS framework. We defined and managed driver-specific color properties, facilitated the transport of user space data to the DC, and translated DRM features and settings to the DC interface. Considerations were also made for differences in the color pipeline based on hardware capabilities.

Exploring Color Capabilities of the AMD display hardware Now, let s dive into the exciting realm of AMD color capabilities, where a abundance of techniques and tools await to make your colors look extraordinary across diverse devices. First, we need to know a little about the color transformation and calibration tools and techniques that you can find in different blocks of the AMD hardware. I borrowed some images from [2] [3] [4] to help you understand the information.

Predefined Transfer Functions (Named Fixed Curves): Transfer functions serve as the bridge between the digital and visual worlds, defining the mathematical relationship between digital color values and linear scene/display values and ensuring consistent color reproduction across different devices and media. You can learn more about curves in the chapter GPU Gems 3 - The Importance of Being Linear by Larry Gritz and Eugene d Eon. ITU-R 2100 introduces three main types of transfer functions:
  • OETF: the opto-electronic transfer function, which converts linear scene light into the video signal, typically within a camera.
  • EOTF: electro-optical transfer function, which converts the video signal into the linear light output of the display.
  • OOTF: opto-optical transfer function, which has the role of applying the rendering intent .
AMD s display driver supports the following pre-defined transfer functions (aka named fixed curves):
  • Linear/Unity: linear/identity relationship between pixel value and luminance value;
  • Gamma 2.2, Gamma 2.4, Gamma 2.6: pure power functions;
  • sRGB: 2.4: The piece-wise transfer function from IEC 61966-2-1:1999;
  • BT.709: has a linear segment in the bottom part and then a power function with a 0.45 (~1/2.22) gamma for the rest of the range; standardized by ITU-R BT.709-6;
  • PQ (Perceptual Quantizer): used for HDR display, allows luminance range capability of 0 to 10,000 nits; standardized by SMPTE ST 2084.
These capabilities vary depending on the hardware block, with some utilizing hardcoded curves and others relying on AMD s color module to construct curves from standardized coefficients. It also supports user/custom curves built from a lookup table.

1D LUTs (1-dimensional Lookup Table): A 1D LUT is a versatile tool, defining a one-dimensional color transformation based on a single parameter. It s very well explained by Jeremy Selan at GPU Gems 2 - Chapter 24 Using Lookup Tables to Accelerate Color Transformations It enables adjustments to color, brightness, and contrast, making it ideal for fine-tuning. In the Linux AMD display driver, the atomic API offers a 1D LUT with 4096 entries and 8-bit depth, while legacy gamma uses a size of 256.

3D LUTs (3-dimensional Lookup Table): These tables work in three dimensions red, green, and blue. They re perfect for complex color transformations and adjustments between color channels. It s also more complex to manage and require more computational resources. Jeremy also explains 3D LUT at GPU Gems 2 - Chapter 24 Using Lookup Tables to Accelerate Color Transformations

CTM (Color Transformation Matrices): Color transformation matrices facilitate the transition between different color spaces, playing a crucial role in color space conversion.

HDR Multiplier: HDR multiplier is a factor applied to the color values of an image to increase their overall brightness.

AMD Color Capabilities in the Hardware Pipeline First, let s take a closer look at the AMD Display Core Next hardware pipeline in the Linux kernel documentation for AMDGPU driver - Display Core Next In the AMD Display Core Next hardware pipeline, we encounter two hardware blocks with color capabilities: the Display Pipe and Plane (DPP) and the Multiple Pipe/Plane Combined (MPC). The DPP handles color adjustments per plane before blending, while the MPC engages in post-blending color adjustments. In short, we expect DPP color capabilities to match up with DRM plane properties, and MPC color capabilities to play nice with DRM CRTC properties. Note: here s the catch there are some DRM CRTC color transformations that don t have a corresponding AMD MPC color block, and vice versa. It s like a puzzle, and we re here to solve it!

AMD Color Blocks and Capabilities We can finally talk about the color capabilities of each AMD color block. As it varies based on the generation of hardware, let s take the DCN3+ family as reference. What s possible to do before and after blending depends on hardware capabilities describe in the kernel driver by struct dpp_color_caps and struct mpc_color_caps. The AMD Steam Deck hardware provides a tangible example of these capabilities. Therefore, we take SteamDeck/DCN301 driver as an example and look at the Color pipeline capabilities described in the file: driver/gpu/drm/amd/display/dcn301/dcn301_resources.c
/* Color pipeline capabilities */
dc->caps.color.dpp.dcn_arch = 1; // If it is a Display Core Next (DCN): yes. Zero means DCE.
dc->caps.color.dpp.input_lut_shared = 0;
dc->caps.color.dpp.icsc = 1; // Intput Color Space Conversion  (CSC) matrix.
dc->caps.color.dpp.dgam_ram = 0; // The old degamma block for degamma curve (hardcoded and LUT).  Gamma correction  is the new one.
dc->caps.color.dpp.dgam_rom_caps.srgb = 1; // sRGB hardcoded curve support
dc->caps.color.dpp.dgam_rom_caps.bt2020 = 1; // BT2020 hardcoded curve support (seems not actually in use)
dc->caps.color.dpp.dgam_rom_caps.gamma2_2 = 1; // Gamma 2.2 hardcoded curve support
dc->caps.color.dpp.dgam_rom_caps.pq = 1; // PQ hardcoded curve support
dc->caps.color.dpp.dgam_rom_caps.hlg = 1; // HLG hardcoded curve support
dc->caps.color.dpp.post_csc = 1; // CSC matrix
dc->caps.color.dpp.gamma_corr = 1; // New  Gamma Correction  block for degamma user LUT;
dc->caps.color.dpp.dgam_rom_for_yuv = 0;
dc->caps.color.dpp.hw_3d_lut = 1; // 3D LUT support. If so, it's always preceded by a shaper curve. 
dc->caps.color.dpp.ogam_ram = 1; //  Blend Gamma  block for custom curve just after blending
// no OGAM ROM on DCN301
dc->caps.color.dpp.ogam_rom_caps.srgb = 0;
dc->caps.color.dpp.ogam_rom_caps.bt2020 = 0;
dc->caps.color.dpp.ogam_rom_caps.gamma2_2 = 0;
dc->caps.color.dpp.ogam_rom_caps.pq = 0;
dc->caps.color.dpp.ogam_rom_caps.hlg = 0;
dc->caps.color.dpp.ocsc = 0;
dc->caps.color.mpc.gamut_remap = 1; // Post-blending CTM (pre-blending CTM is always supported)
dc->caps.color.mpc.num_3dluts = pool->base.res_cap->num_mpc_3dlut; // Post-blending 3D LUT (preceded by shaper curve)
dc->caps.color.mpc.ogam_ram = 1; // Post-blending regamma.
// No pre-defined TF supported for regamma.
dc->caps.color.mpc.ogam_rom_caps.srgb = 0;
dc->caps.color.mpc.ogam_rom_caps.bt2020 = 0;
dc->caps.color.mpc.ogam_rom_caps.gamma2_2 = 0;
dc->caps.color.mpc.ogam_rom_caps.pq = 0;
dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
dc->caps.color.mpc.ocsc = 1; // Output CSC matrix.
I included some inline comments in each element of the color caps to quickly describe them, but you can find the same information in the Linux kernel documentation. See more in struct dpp_color_caps, struct mpc_color_caps and struct rom_curve_caps. Now, using this guideline, we go through color capabilities of DPP and MPC blocks and talk more about mapping driver-specific properties to corresponding color blocks.

DPP Color Pipeline: Before Blending (Per Plane) Let s explore the capabilities of DPP blocks and what you can achieve with a color block. The very first thing to pay attention is the display architecture of the display hardware: previously AMD uses a display architecture called DCE
  • Display and Compositing Engine, but newer hardware follows DCN - Display Core Next.
The architectute is described by: dc->caps.color.dpp.dcn_arch

AMD Plane Degamma: TF and 1D LUT Described by: dc->caps.color.dpp.dgam_ram, dc->caps.color.dpp.dgam_rom_caps,dc->caps.color.dpp.gamma_corr AMD Plane Degamma data is mapped to the initial stage of the DPP pipeline. It is utilized to transition from scanout/encoded values to linear values for arithmetic operations. Plane Degamma supports both pre-defined transfer functions and 1D LUTs, depending on the hardware generation. DCN2 and older families handle both types of curve in the Degamma RAM block (dc->caps.color.dpp.dgam_ram); DCN3+ separate hardcoded curves and 1D LUT into two block: Degamma ROM (dc->caps.color.dpp.dgam_rom_caps) and Gamma correction block (dc->caps.color.dpp.gamma_corr), respectively. Pre-defined transfer functions:
  • they are hardcoded curves (read-only memory - ROM);
  • supported curves: sRGB EOTF, BT.709 inverse OETF, PQ EOTF and HLG OETF, Gamma 2.2, Gamma 2.4 and Gamma 2.6 EOTF.
The 1D LUT currently accepts 4096 entries of 8-bit. The data is interpreted as an array of struct drm_color_lut elements. Setting TF = Identity/Default and LUT as NULL means bypass. References:

AMD Plane 3x4 CTM (Color Transformation Matrix) AMD Plane CTM data goes to the DPP Gamut Remap block, supporting a 3x4 fixed point (s31.32) matrix for color space conversions. The data is interpreted as a struct drm_color_ctm_3x4. Setting NULL means bypass. References:

AMD Plane Shaper: TF + 1D LUT Described by: dc->caps.color.dpp.hw_3d_lut The Shaper block fine-tunes color adjustments before applying the 3D LUT, optimizing the use of the limited entries in each dimension of the 3D LUT. On AMD hardware, a 3D LUT always means a preceding shaper 1D LUT used for delinearizing and/or normalizing the color space before applying a 3D LUT, so this entry on DPP color caps dc->caps.color.dpp.hw_3d_lut means support for both shaper 1D LUT and 3D LUT. Pre-defined transfer function enables delinearizing content with or without shaper LUT, where AMD color module calculates the resulted shaper curve. Shaper curves go from linear values to encoded values. If we are already in a non-linear space and/or don t need to normalize values, we can set a Identity TF for shaper that works similar to bypass and is also the default TF value. Pre-defined transfer functions:
  • there is no DPP Shaper ROM. Curves are calculated by AMD color modules. Check calculate_curve() function in the file amd/display/modules/color/color_gamma.c.
  • supported curves: Identity, sRGB inverse EOTF, BT.709 OETF, PQ inverse EOTF, HLG OETF, and Gamma 2.2, Gamma 2.4, Gamma 2.6 inverse EOTF.
The 1D LUT currently accepts 4096 entries of 8-bit. The data is interpreted as an array of struct drm_color_lut elements. When setting Plane Shaper TF (!= Identity) and LUT at the same time, the color module will combine the pre-defined TF and the custom LUT values into the LUT that s actually programmed. Setting TF = Identity/Default and LUT as NULL works as bypass. References:

AMD Plane 3D LUT Described by: dc->caps.color.dpp.hw_3d_lut The 3D LUT in the DPP block facilitates complex color transformations and adjustments. 3D LUT is a three-dimensional array where each element is an RGB triplet. As mentioned before, the dc->caps.color.dpp.hw_3d_lut describe if DPP 3D LUT is supported. The AMD driver-specific property advertise the size of a single dimension via LUT3D_SIZE property. Plane 3D LUT is a blog property where the data is interpreted as an array of struct drm_color_lut elements and the number of entries is LUT3D_SIZE cubic. The array contains samples from the approximated function. Values between samples are estimated by tetrahedral interpolation The array is accessed with three indices, one for each input dimension (color channel), blue being the outermost dimension, red the innermost. This distribution is better visualized when examining the code in [RFC PATCH 5/5] drm/amd/display: Fill 3D LUT from userspace by Alex Hung:
+	for (nib = 0; nib < 17; nib++)  
+		for (nig = 0; nig < 17; nig++)  
+			for (nir = 0; nir < 17; nir++)  
+				ind_lut = 3 * (nib + 17*nig + 289*nir);
+
+				rgb_area[ind].red = rgb_lib[ind_lut + 0];
+				rgb_area[ind].green = rgb_lib[ind_lut + 1];
+				rgb_area[ind].blue = rgb_lib[ind_lut + 2];
+				ind++;
+			 
+		 
+	 
In our driver-specific approach we opted to advertise it s behavior to the userspace instead of implicitly dealing with it in the kernel driver. AMD s hardware supports 3D LUTs with 17-size or 9-size (4913 and 729 entries respectively), and you can choose between 10-bit or 12-bit. In the current driver-specific work we focus on enabling only 17-size 12-bit 3D LUT, as in [PATCH v3 25/32] drm/amd/display: add plane 3D LUT support:
+		/* Stride and bit depth are not programmable by API yet.
+		 * Therefore, only supports 17x17x17 3D LUT (12-bit).
+		 */
+		lut->lut_3d.use_tetrahedral_9 = false;
+		lut->lut_3d.use_12bits = true;
+		lut->state.bits.initialized = 1;
+		__drm_3dlut_to_dc_3dlut(drm_lut, drm_lut3d_size, &lut->lut_3d,
+					lut->lut_3d.use_tetrahedral_9,
+					MAX_COLOR_3DLUT_BITDEPTH);
A refined control of 3D LUT parameters should go through a follow-up version or generic API. Setting 3D LUT to NULL means bypass. References:

AMD Plane Blend/Out Gamma: TF + 1D LUT Described by: dc->caps.color.dpp.ogam_ram The Blend/Out Gamma block applies the final touch-up before blending, allowing users to linearize content after 3D LUT and just before the blending. It supports both 1D LUT and pre-defined TF. We can see Shaper and Blend LUTs as 1D LUTs that are sandwich the 3D LUT. So, if we don t need 3D LUT transformations, we may want to only use Degamma block to linearize and skip Shaper, 3D LUT and Blend. Pre-defined transfer function:
  • there is no DPP Blend ROM. Curves are calculated by AMD color modules;
  • supported curves: Identity, sRGB EOTF, BT.709 inverse OETF, PQ EOTF, HLG inverse OETF, and Gamma 2.2, Gamma 2.4, Gamma 2.6 EOTF.
The 1D LUT currently accepts 4096 entries of 8-bit. The data is interpreted as an array of struct drm_color_lut elements. If plane_blend_tf_property != Identity TF, AMD color module will combine the user LUT values with pre-defined TF into the LUT parameters to be programmed. Setting TF = Identity/Default and LUT to NULL means bypass. References:

MPC Color Pipeline: After Blending (Per CRTC)

DRM CRTC Degamma 1D LUT The degamma lookup table (LUT) for converting framebuffer pixel data before apply the color conversion matrix. The data is interpreted as an array of struct drm_color_lut elements. Setting NULL means bypass. Not really supported. The driver is currently reusing the DPP degamma LUT block (dc->caps.color.dpp.dgam_ram and dc->caps.color.dpp.gamma_corr) for supporting DRM CRTC Degamma LUT, as explaning by [PATCH v3 20/32] drm/amd/display: reject atomic commit if setting both plane and CRTC degamma.

DRM CRTC 3x3 CTM Described by: dc->caps.color.mpc.gamut_remap It sets the current transformation matrix (CTM) apply to pixel data after the lookup through the degamma LUT and before the lookup through the gamma LUT. The data is interpreted as a struct drm_color_ctm. Setting NULL means bypass.

DRM CRTC Gamma 1D LUT + AMD CRTC Gamma TF Described by: dc->caps.color.mpc.ogam_ram After all that, you might still want to convert the content to wire encoding. No worries, in addition to DRM CRTC 1D LUT, we ve got a AMD CRTC gamma transfer function (TF) to make it happen. Possible TF values are defined by enum amdgpu_transfer_function. Pre-defined transfer functions:
  • there is no MPC Gamma ROM. Curves are calculated by AMD color modules.
  • supported curves: Identity, sRGB inverse EOTF, BT.709 OETF, PQ inverse EOTF, HLG OETF, and Gamma 2.2, Gamma 2.4, Gamma 2.6 inverse EOTF.
The 1D LUT currently accepts 4096 entries of 8-bit. The data is interpreted as an array of struct drm_color_lut elements. When setting CRTC Gamma TF (!= Identity) and LUT at the same time, the color module will combine the pre-defined TF and the custom LUT values into the LUT that s actually programmed. Setting TF = Identity/Default and LUT to NULL means bypass. References:

Others

AMD CRTC Shaper and 3D LUT We have previously worked on exposing CRTC shaper and CRTC 3D LUT, but they were removed from the AMD driver-specific color series because they lack userspace case. CRTC shaper and 3D LUT works similar to plane shaper and 3D LUT but after blending (MPC block). The difference here is that setting (not bypass) Shaper and Gamma blocks together are not expected, since both blocks are used to delinearize the input space. In summary, we either set Shaper + 3D LUT or Gamma.

Input and Output Color Space Conversion There are two other color capabilities of AMD display hardware that were integrated to DRM by previous works and worth a brief explanation here. The DC Input CSC sets pre-defined coefficients from the values of DRM plane color_range and color_encoding properties. It is used for color space conversion of the input content. On the other hand, we have de DC Output CSC (OCSC) sets pre-defined coefficients from DRM connector colorspace properties. It is uses for color space conversion of the composed image to the one supported by the sink. References:

The search for rainbow treasures is not over yet If you want to understand a little more about this work, be sure to watch Joshua and I presented two talks at XDC 2023 about AMD/Steam Deck colors on Gamescope: In the time between the first and second part of this blog post, Uma Shashank and Chaitanya Kumar Borah published the plane color pipeline for Intel and Harry Wentland implemented a generic API for DRM based on VKMS support. We discussed these two proposals and the next steps for Color on Linux during the Color Management workshop at XDC 2023 and I briefly shared workshop results in the 2023 XDC lightning talk session. The search for rainbow treasures is not over yet! We plan to meet again next year in the 2024 Display Hackfest in Coru a-Spain (Igalia s HQ) to keep up the pace and continue advancing today s display needs on Linux. Finally, a HUGE thank you to everyone who worked with me on exploring AMD s color capabilities and making them available in userspace.

5 November 2023

Petter Reinholdtsen: Test framework for DocBook processors / formatters

All the books I have published so far has been using DocBook somewhere in the process. For the first book, the source format was DocBook, while for every later book it was an intermediate format used as the stepping stone to be able to present the same manuscript in several formats, on paper, as ebook in ePub format, as a HTML page and as a PDF file either for paper production or for Internet consumption. This is made possible with a wide variety of free software tools with DocBook support in Debian. The source format of later books have been docx via rst, Markdown, Filemaker and Asciidoc, and for all of these I was able to generate a suitable DocBook file for further processing using pandoc, a2x and asciidoctor, as well as rendering using xmlto, dbtoepub, dblatex, docbook-xsl and fop. Most of the books I have published are translated books, with English as the source language. The use of po4a to handle translations using the gettext PO format has been a blessing, but publishing translated books had triggered the need to ensure the DocBook tools handle relevant languages correctly. For every new language I have published, I had to submit patches dblatex, dbtoepub and docbook-xsl fixing incorrect language and country specific issues in the framework themselves. Typically this has been missing keywords like 'figure' or sort ordering of index entries. After a while it became tiresome to only discover issues like this by accident, and I decided to write a DocBook "test framework" exercising various features of DocBook and allowing me to see all features exercised for a given language. It consist of a set of DocBook files, a version 4 book, a version 5 book, a v4 book set, a v4 selection of problematic tables, one v4 testing sidefloat and finally one v4 testing a book of articles. The DocBook files are accompanied with a set of build rules for building PDF using dblatex and docbook-xsl/fop, HTML using xmlto or docbook-xsl and epub using dbtoepub. The result is a set of files visualizing footnotes, indexes, table of content list, figures, formulas and other DocBook features, allowing for a quick review on the completeness of the given locale settings. To build with a different language setting, all one need to do is edit the lang= value in the .xml file to pick a different ISO 639 code value and run 'make'. The test framework source code is available from Codeberg, and a generated set of presentations of the various examples is available as Codeberg static web pages at https://pere.codeberg.page/docbook-example/. Using this test framework I have been able to discover and report several bugs and missing features in various tools, and got a lot of them fixed. For example I got Northern Sami keywords added to both docbook-xsl and dblatex, fixed several typos in Norwegian bokm l and Norwegian Nynorsk, support for non-ascii title IDs added to pandoc, Norwegian index sorting support fixed in xindy and initial Norwegian Bokm l support added to dblatex. Some issues still remains, though. Default index sorting rules are still broken in several tools, so the Norwegian letters , and are more often than not sorted properly in the book index. The test framework recently received some more polish, as part of publishing my latest book. This book contained a lot of fairly complex tables, which exposed bugs in some of the tools. This made me add a new test file with various tables, as well as spend some time to brush up the build rules. My goal is for the test framework to exercise all DocBook features to make it easier to see which features work with different processors, and hopefully get them all to support the full set of DocBook features. Feel free to send patches to extend the test set, and test it with your favorite DocBook processor. Please visit these two URLs to learn more: If you want to learn more on Docbook and translations, I recommend having a look at the the DocBook web site, the DoCookBook site and my earlier blog post on how the Skolelinux project process and translate documentation, a talk I gave earlier this year on how to translate and publish books using free software (Norwegian only). As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

Next.

Previous.